Skip to content
View dsevero's full-sized avatar
🎲
🎲
Block or Report

Block or report dsevero

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dsevero/README.md

Research Engineer

Meta - Fundamental AI Research (FAIR) Labs

Originally, I am from Florianópolis (Brazil) but I've lived in New Jersey, Orlando, Toronto (now), São Paulo, as well as other smaller cities in the south of Brazil. I spent 2022 at Google AI with Lucas Theis and Johannes Ballé as a Student Researcher.

Google Scholar X (Twitter) CV

Research Interests

I'm interested in information theory, machine learning, and AI.

Compression of non-sequential data

Lossless compression algorithms typically preserve the ordering in which data points are compressed. However, there are data types where order is not meaningful, such as collections of files, rows in a database, nodes in a graph, and, notably, datasets in machine learning applications.

Compressing with traditional algorithms is possible if we pick an order for the elements and communicate the corresponding ordered sequence. However, unless the order information is somehow removed during the encoding process, this procedure will be sub-optimal, because the order contains information and therefore more bits are used to represent the source than are truly necessary.

In previous works, we gave a formal definition for non-sequential objects as random sets of equivalent sequences, which we call Combinatorial Random Variables (CRVs), as well as a general class of computatioanlly efficient algorithms that achieve the optimal compression rate of CRVs: Random Permutation Codes (RPCs). Specialized RPCs are given for the case of multisets (Random Order Coding), graphs (Random Edge Coding), and partitions/clusterings (under review), providing new algorithms for compression of databases, social networks, and web data in the JSON file format.

Currently, I'm interested in the application of RPCs to reduce the memory footprint of vector databases.

Latest News

April 2024 - I've moved to Montréal to start as a Research Engineer at FAIR Labs!

March 2024 - LASI and Shuffle Coding were accepted to ICLR 2024.

August 2023 - I started a second internship at FAIR (Meta AI) in information theory and generative modelling with Matthew Muckley.

April 2023 - Random Edge Coding and Action Matching were accepted to ICML 2023.

Tutorials and Workshops

Recommended readings (not my authorship)

Selected Publications and Preprints

For a complete list, please see my Google Scholar profile.

The Unreasonable Effectiveness of Linear Prediction as a Perceptual Metric
Daniel Severo, Lucas Theis, Johannes Ballé
International Conference on Learning Representations (ICLR), 2024

Random Edge Coding: One-Shot Bits-Back Coding of Large Labeled Graphs
Daniel Severo, James Townsend, Ashish Khisti, Alireza Makhzani
International Conference on Machine Learning (ICML), 2023

Action Matching: Learning Stochastic Dynamics from Samples
Kirill Neklyudov, Rob Brekelmans, Daniel Severo, Alireza Makhzani
International Conference on Machine Learning (ICML), 2023
Compressing Multisets with Large Alphabets using Bits-Back Coding
Daniel Severo, James Townsend, Ashish Khisti, Alireza Makhzani, Karen Ullrich
IEEE Journal on Selected Areas in Information Theory, 2023
Best Paper Award at NeurIPS Workshop on DGMs, 2021

Pinned Loading

  1. facebookresearch/NeuralCompression facebookresearch/NeuralCompression Public

    A collection of tools for neural compression enthusiasts.

    Python 482 42

  2. facebookresearch/multiset-compression facebookresearch/multiset-compression Public archive

    Official code accompanying the arXiv paper Compressing Multisets with Large Alphabets

    Python 26 4

  3. j-towns/craystack j-towns/craystack Public

    Compression tools for machine learning researchers

    Python 82 8

  4. Linear-Autoregressive-Similarity-Index Linear-Autoregressive-Similarity-Index Public

    Code for "The Unreasonable Effectiveness of Linear Prediction as a Perceptual Metric"

    Python 21 1

  5. craystack craystack Public

    Forked from j-towns/craystack

    Compression tools for machine learning researchers

    Python