Qax: A library for simplifying research prototyping #16516

davisyoshida · 2023-06-22T08:47:44Z

davisyoshida
Jun 22, 2023

🦆Qax🦆

I've been working on a tool called Qax, which I solicited feedback on a couple weeks ago. I've made a bunch of ergonomics improvements since then, and I think it's ready for people to start using for prototyping research ideas.

The main idea of this tool is to make it easy to implement anything which falls under "thing which represents a tensor but doesn't actually instantiate it." The motivating use cases were LoRA and quantized matrices, but I've run into several more since, and @patrick-kidger has suggested a bunch as well. You can basically think of it as a convenience wrapper around Tracers for use cases which don't need the full power of the tracing system.

I have a Twitter thread going over the core idea, but I'll expand on it a bit here so people can give feedback. Please do let me know if you have any suggestions. (I already have some backend changes planned based on conversations with Patrick, but they shouldn't affect the frontend API).

Mini-LoRA

Here's a minimal example: implementing LoRA. I've golfed it in order to show what the essential parts are:

@dataclass
class LoraMatrix(qax.ImplicitArray):                            
    W : jax.Array
    A : jax.Array
    B : jax.Array

    def materialize(self):                                                 
        return self.W + self.A @ self.B.T
                                         
@qax.primitive_handler('dot_general')
def lora_matmul(primitive, x, lora : LoraMatrix, **kwargs):                      
    return x @ lora.W + x @ lora.A @ lora.B.T

The thought here is that ideas like LoRA can be fully specified by answering three questions:

What data is being stored?
Should the array behave in a certain way under any operations?
How should the array be converted into a dense array if an unsupported operation is used?

Slightly-less-mini-LoRA

Here's an ungolfed version which shows a few more features as well as not crashing if dot_generals other than left matmuls happen:

from dataclasses import dataclass
import jax
import jax.numpy as jnp
import qax

@dataclass
class LoraMatrix(qax.ImplicitArray):
    # Question #1 is answered by defining dataclass fields
    # ImplicitArray instances are pytrees, and their fields will automatically be
    # used as pytree children
    W : jax.Array
    A : jax.Array
    B : jax.Array

    # To mark an attribute as auxiliary you can use `qax.aux_field`.
    # It's just a wrapper around `dataclass.field` which sets a
    # flag in the field metadata
    some_metadata : str = qax.aux_field(default='hello')

    # Question #3 is answered by defining a `materialize()` method which
    # produces the actual array (although it could also raise an exception)
    def materialize(self):
        return self.W + self.A @ self.B.T

# Question #2 is answered by defining custom primitive handlers, multiple dispatch
# is done on each primitive using Plum. Passing a string will register the handler for
# the jax.lax primitive named '{the_string}_p'
@qax.primitive_handler('dot_general')
def lora_matmul(primitive, x, lora : LoraMatrix, *, dimension_numbers, **kwargs):
    (lhs_contract, rhs_contract), (lhs_batch, rhs_batch) = dimension_numbers

    # Verify that the operation is just `x @ lora`
    if not (
        lhs_contract == x.ndim - 1
        and rhs_contract == 0
        and lhs_batch == rhs_batch == ()
    ):
        # Returning `NotImplemented` defers to the handler which just
        # materializes all the arguments
        return NotImplemented

    do_dot = partial(qax.default_handler, primitive, dimension_numbers=dimension_numbers, **kwargs)
    result = do_dot(x, lora.W)
    xa = do_dot(x, lora.A)
    xab = do_dot(xa, lora.B.T)

    return result + xab

I didn't have to specify the shape and dtype which define this array's aval here because they're automatically derived from the materialize() method. In cases where they can't be derived that way (e.g. a symbolic zero needs to already know its shape/dtype in order to be materialized), they can be passed as keyword args at initialization.

The way to use the type defined above is with the qax.use_implicit_args transform. It transforms a function which takes JAX types into one that takes an ImplicitArray in any postion/keyword where an ImplicitArray was passed.

The upshot of this is that you don't need to modify the underlying model code, and you get handed a function which is compatible with the JIT, grad, and vmap. (I'm not sure how it plays with multi-device stuff yet since I do all my development on two GPUs which aren't the same model...).

The code below shows how to use the above LoRA type with a HuggingFace model. Most of it is just standard, so I've marked the two Qax related changes.

import transformers
# Standard model setup                                                                                                                             
name = 'EleutherAI/gpt-neo-2.7B'                                                                                                                   
model, params = transformers.FlaxAutoModelForCausalLM.from_pretrained(name, _do_init=False)                                                        
params = jax.device_put(params, jax.devices('gpu')[0])                                                                                             
tokenizer = transformers.AutoTokenizer.from_pretrained(name)                                                                                       
                                                                                                                                                   
def iter_keys(key):                                                                                                              
    while True:                                                                                                                  
        key, subkey = jax.random.split(key)                                                                                      
        yield subkey                                                                                                             
key = jax.random.PRNGKey(0)                                                                                                      
key_it = iter_keys(key)                                                                                                          

# Difference 1: Transform/replace params to use some ImplicitArray type
                 
# Tree map which was too scary for the twitter thread                            
rank = 16             
lora_params = jax.tree_util.tree_map_with_path(                                                                                  
    lambda path, param:                                                                                                                 
        param                                                                                                                    
        if param.ndim != 2 or any(isinstance(p, jax.tree_util.DictKey) and p.key == 'embedding' for p in path) else              
        LoraMatrix(                                                                                                              
            W=param,                                                                                                             
            A=jax.random.normal(next(key_it), (param.shape[0], rank)),          
            B=jnp.zeros((param.shape[1], rank)),                                
        ),
        params
)                                                   
                                                                                
inputs = jnp.asarray(tokenizer.encode('Mickey Mouse, Donald'))[None]            

# Difference 2: Wrap the model or function in `qax.use_implicit_args` (you'd wrap the loss function in a full training loop)

wrapped_model = qax.use_implicit_args(model)
logits = model_fn(inputs, params=lora_params).logits                                                                         

# Get the next token prediction                                                                                                                      
print(tokenizer.decode(logits[0, -1].argmax()))                                                                                                      
# Duck

It works!

Symbolic identity matrix

As another demo, here's an identity matrix which only stores a shape but no actual data:

@dataclass    
class Eye(qax.ImplicitArray):    
    dim : InitVar[int]    
    
    def __init__(self, dim, dtype=jnp.float32):    
        super().__init__(shape=(dim, dim), dtype=dtype)    
    
    def materialize(self):    
        return jnp.eye(self.shape[0], dtype=self.dtype)    
    
@qax.primitive_handler(jax.lax.dot_general_p)    
def dot_handler(primitive, lhs : Union[Eye,jax.Array], rhs : Union[Eye,jax.Array], *, dimension_numbers, **kwargs):    
    lhs_aval = jax.core.get_aval(lhs)    
    rhs_aval = jax.core.get_aval(rhs)    
    
    out_aval = jax.eval_shape(    
        partial(qax.default_handler, primitive, dimension_numbers=dimension_numbers, **kwargs),    
        lhs_aval, rhs_aval    
    )    
    
    lhs_is_eye = isinstance(lhs, Eye)    
    rhs_is_eye = isinstance(rhs, Eye)    
    
    (lhs_contract, rhs_contract), (lhs_batch, rhs_batch) = dimension_numbers    
    # I'm only implementing 1-2D x 2D matmuls since it's 1 AM
    # and I can only fully conceptualize dot_general during the day
    if not (    
        1 <= lhs.aval.ndim <= 2    
        and rhs.aval.ndim <= 2    
        and len(lhs_contract) == len(rhs_contract) == 1    
        and lhs_batch == rhs_batch == ()    
    ):    
        return NotImplemented    
    
    if lhs_is_eye and rhs_is_eye:    
        return Eye(out_aval.shape[0], dtype=out_aval.dtype)    
    
    result = rhs if lhs_is_eye else lhs    
    return result.astype(out_aval.dtype)

Matmuls which pass the NotImplemented check above will take zero FLOPs and just pass their argument through:

@qax.use_implicit_args    
def f(a, b):    
    return a @ b    
    
w = Eye(3)    
x = jnp.arange(39, dtype=jnp.float32).reshape(3, 13)    
    
print(f(w, x))
# [[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12.]
#  [13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.]
#  [26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38.]]

Nesting ImplicitArrays

ImplicitArray instances can be arbitrarily nested, so combining this with LoraMatrix gives a representation of $(I + AB^T)$, which only takes as many FLOPs as computing $x + xAB^T$. No further modifications to/decorations of f are necessary to support this:

dim = 128    
rank = 16
eye_plus_low_rank = LoraMatrix(    
    W=Eye(dim),    
    A=jax.random.normal(jax.random.PRNGKey(0), (dim, rank)),    
    B=jnp.zeros((dim, rank))    
)    
 
x = jax.random.normal(jax.random.PRNGKey(1), (73, dim))    

print(jnp.sum(f(x, eye_plus_low_rank)))
# 40.892204

Assorted examples

I already used Qax to implement the quantization method I was analyzing in this repo/arxiv note as well. Being able to test stuff on random HuggingFace models or my own Haiku models without patching model code all the time was a really nice change of pace.

A couple more examples:

The README has a full training loop (also available as a Colab)
The examples directory has a few more simple examples (not feature complete!)

TODOs

Support for loop primitives (there are some edge cases in handling the jaxprs that are giving me a bit of trouble)
Documentation
Possibly implement some examples up to full usability level since a lot of people might just want them as tools rather than rolling their own with Qax

👋

Thanks taking the time to read this! Please let me know if you have any questions or comments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qax: A library for simplifying research prototyping #16516

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Qax: A library for simplifying research prototyping #16516

davisyoshida Jun 22, 2023

🦆Qax🦆

Mini-LoRA

Slightly-less-mini-LoRA

Symbolic identity matrix

Nesting ImplicitArrays

Assorted examples

TODOs

👋

Replies: 0 comments

davisyoshida
Jun 22, 2023