Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Update stdlib corresponding to 2024-05-10 nightly/mojo #2615

Merged
merged 24 commits into from
May 11, 2024

Conversation

JoeLoser
Copy link
Collaborator

This updates the stdlib with the internal commits corresponding to today's nightly release: mojo 2024.5.1102.

bethebunny and others added 24 commits May 11, 2024 02:20
Paired with @ConnorGray

Dictionary performance is pretty bad. Much of this poor performance
comes from accidental copies that survived from before Mojo had
references. Previously a dict with ~20k strings would take O(minutes) to
construct, now dictionaries with O(10m) strings are usable. This is just
low-hanging fruit, there's still plenty to optimize for further
performance.

- Adds a `Dict.__get_ref(i)` which allows refitem semantics without a
copy
- Updates dict insert and __setitem__ to take owned values
- Updatessdict insert, pop, resize and compact to remove any copies
- Changes `Optional.take` and `Variant.take` to take `inout self` rather
than `owned self`
  - Changes `Optional.take` to reset the state to empty
- `Variant.take` leaves the variant in an uninitialized state, should be
renamed to make it explicitly unsafe. Leaving as followup.
- Updates the dict tests to validate that creation, insertion, and
access do the minimal required copies.
- Fixes an existing uncaught bug with dictionary compaction, adds a unit
test for that case.

MODULAR_ORIG_COMMIT_REV_ID: e1fadfc0a4ab9f7a1afba35dea4e26a99287a387
… (#39730)

[External] [stdlib] Refactor SIMD tests to directly call __floordiv__

Modify the SIMD tests for the `__floordiv__` to directly invoke this
dunder method instead of using the // operator. This approach is
recommended for unit testing magic methods of basic numeric types to
prevent unintentional implicit conversions.

Co-authored-by: Peyman Barazandeh <peymanb@gmail.com>
Closes modularml#2602
MODULAR_ORIG_COMMIT_REV_ID: 202bd98f7a272895409df7152d3827754b84d216
[External] [stdlib] Add `Comparable` trait

Add a `Comparable` trait for comparison testing conformance.
Explicitly conform `FloatLiteral`, `IntLiteral`, `Int`, and `Set`.

Co-authored-by: Helehex <Helehex@gmail.com>
Closes modularml#2517
MODULAR_ORIG_COMMIT_REV_ID: 23fe6cfb61861f4115d38fa195876bce48f248cd
MODULAR_ORIG_COMMIT_REV_ID: 523b0571580544d0f8728d645b5954c06ab121ef
This is an internal only implementation detail, not something that
should be tested.

MODULAR_ORIG_COMMIT_REV_ID: 45d09e96eb287a040e32dc4df2507fcf32332290
[External] [Proposal] Improve the hash module

This proposal is based on discussion started in
modularml#1744

Co-authored-by: Maxim Zaks <maxim.zaks@gmail.com>
Closes modularml#2250
MODULAR_ORIG_COMMIT_REV_ID: 692c7d5940b8c88e83ef895b0be26a33a06ad941
…39560)

[External] [stdlib] Add `InlineList` struct (stack-allocated List)

This struc is very useful to implement SSO, it's related to
* modularml#2467
* modularml#2507

If this is merged, I can take advantage of this in my PR that has the
SSO POC

About `InlineFixedVector`: `InlineList` is different. Notably,
`InlineList` have its capacity decided at compile-time, and there is no
heap allocation (unless the elements have heap-allocated data of
course).

`InlineFixedVector` stores the first N element on the stack, and the
next elements on the heap. Since not all elements are not in the same
spot, it makes it hard to work with pointers there as the data is not
contiguous.

Co-authored-by: Gabriel de Marmiesse <gabriel.demarmiesse@datadoghq.com>
Closes modularml#2587
MODULAR_ORIG_COMMIT_REV_ID: 86df7b19f0f38134fbaeb8a23fe9aef27e47c554
[External] [stdlib] Implement `mkdir` and `rmdir`

Add functionality to the `os` module for creating and
removing directories via `mkdir` and `rmdir`.

Signed-off-by: Artemio Garza Reyna <artemiogr97@gmail.com>

Co-authored-by: Artemio Garza Reyna <artemiogr97@gmail.com>
Closes modularml#2430
MODULAR_ORIG_COMMIT_REV_ID: 8571848227dfd72f1672699a227e21354d5cf3e1
…st` (#39734)

[External] [stdlib] Support `__add__` and `__mul__` operators for `List`

Fixes modularml#2589

Co-authored-by: bgreni <42788181+bgreni@users.noreply.github.com>
Closes modularml#2590
MODULAR_ORIG_COMMIT_REV_ID: e08d28e0b92394a419ff4d5b231b969ec74635e1
This operation directly accesses the callback pointer inside the
coroutine frame, so the stdlib doesn't have to perform the offset
computation. This abstracts the location of the callback from stdlib
code.

MODULAR_ORIG_COMMIT_REV_ID: d8bef12e4230e6a8c04d2ecd5139091e7d6f140d
Remove unused import in `builtin/int.mojo`.

MODULAR_ORIG_COMMIT_REV_ID: 96f1f293285541ef4737e2155ba2ac82592786ef
- Use `def` instead of `fn` for Python pretty print
- Remove printing results
- Pass `type` to `scalar`

MODULAR_ORIG_COMMIT_REV_ID: 02564b680fd7448f122de1fa38ce736e98b940d9
This removes uses of the `co.promise` operation from the stdlib and uses
`co.get_results` to access the register-passable results of a coroutine.
This is more succinct.

MODULAR_ORIG_COMMIT_REV_ID: c1057fde5204a1913a33f96b294544e2fcdad04e
[Internal link]

[Internal link]

Switches the hash algorithm for SIMD values from DJB33XA to the
[ankerl::unordered_dense::hash](https://martin.ankerl.com/2022/08/27/hashmap-bench-01/#ankerlunordered_densehash-).
This substantially improves the distribution of hash values on
sequential values, resulting in dramatically shorter probe sequences.

In the above plots, performance is measured and averaged on string keys
and sequential integer keys, first for the 24.3 dict/hash
implementations, then with the changes to Dict in
[Internal link] and finally after this
change (called 24.4). Large maps should see 10x or better improvements
with most key types, and even moderate sized maps with integer keys
should see even much larger improvements.

MODULAR_ORIG_COMMIT_REV_ID: a8383f6ed9271a737b635942cbe87361ddf3a5c5
Add `__neg__` to `Bool` so `-True` and `-False` work.

MODULAR_ORIG_COMMIT_REV_ID: 0d3fe5daae5232416f71d14ff822480417cbbbc6
Replaced `__get_address_as_owned_value` with `{move_from,
destroy}_pointee`.

MODULAR_ORIG_COMMIT_REV_ID: e189c2b0762bf0103d58377cc185affcf18b1c5b
MODULAR_ORIG_COMMIT_REV_ID: 2ac07276b3dde9ee2a4e650c4f11d45d8740d957
[External] [stdlib] Introduce non owning collection type

This PR introduces a non owning collection type for contiguous arrays of
`CollectionElement`. With this type, we will be able to tie together the
array types in the standard library, and reduce necessary copies in some
places.

### Motivation

The pointer & length data structure is common across most mainstream
programming languages at this point
([Zig](https://ziglang.org/documentation/master/#Slices),
[Rust](https://doc.rust-lang.org/std/primitive.slice.html),
[C++](https://en.cppreference.com/w/cpp/container/span)). In order to
mirror APIs developers are used to with the performance characteristic
they expect, this type will be required.

Notably, it now allows functions that can work on `List` _or_ `Array`
without an overload:
```mojo
fn copy[
    T: CollectionElement, lifetime: MutLifetime
](owned dst: Span[T, __mlir_attr.`1: i1`, lifetime], src: Span[T, _, _],):
    for i in range(len(src)):
        dst[i] = src[i]

fn main():
    var l = List[Int](1, 2, 3)
    var a = InlineArray[Int, 3](4, 5, 6)
    copy(Span(l), Span(a))
    for i in l:
        print(i[])
```
This code keeps the `List` and `InlineArray` alive long enough for the
copy to work, which is not true for `Buffer` or `DTypePointer`.

This type will be especially necessary for traits such as `Read` and
`Write`, which will want to take a `Span` as the relevant buffer, since
taking a `List` or `Array` is use case dependent for such functions.

Another motivation for this change is to reduce unnecessary copies on
methods such as slicing a `List`, which currently allocates a new list.
Ideally, we would just return a `Span` without making a copy. The same
could be said for `String.as_bytes()`, which does not need to allocate a
`List`.

**Note about naming**

I'm using the name `Span` to mirror C++'s `std::span<T>`. Rust, Go, and
Zig call this type a `slice`, but that term has an overloaded meaning in
Python. It's also nice that it's short as a name. Similarly, Mojo
already has the `Buffer` type. I'm open to other options if `Span` is
not the right fit.

Co-authored-by: Lukas Hermann <lukashermann28@gmail.com>
Closes modularml#2595
MODULAR_ORIG_COMMIT_REV_ID: 2fb5385d4d2f141150088ebd3e107cebdba761d2
- Test currently failing on AMD/Intel:
[Internal link]
- Basically these tests are really bad, they're just basic attempts to
verify that the hash function isn't totally degenerate, but because hash
values vary between architectures and these test only a couple values,
there's a high amount of variance in the results and they often flake on
related changes and need to be tweaked.
- Making this particular test case _much_ more relaxed for now.

MODULAR_ORIG_COMMIT_REV_ID: f9bc833ac24a7b11c70fca67cb0d1a3449b7b78e
[External] [stdlib] String comparisons implemented

For issue modularml#2346 (as an alternative to modularml#2378). All four comparisons
(`__lt__`, `__le__`, `__gt__`, & `__ge__`) uses a single `__lt__`
comparison (instead of checking less/greater than + potentially another
"equals to"-check, for `__le__` & `__ge__`). Sorry if this is considered
a duplicate PR, I only meant to give an alternative suggestion. This is
my first ever PR on GitHub.

StringLiterals also get comparisons.

ORIGINAL_AUTHOR=Simon Hellsten
<56205346+siitron@users.noreply.github.com>
PUBLIC_PR_LINK=modularml#2409

---------

Co-authored-by: Simon Hellsten <56205346+siitron@users.noreply.github.com>
Closes modularml#2409
MODULAR_ORIG_COMMIT_REV_ID: b2ed4756c2741fd27387fa295515f4a7222e0ca5
Disable this flaky entropy test so we can get CI green again.
We'll come back to this test soon.

MODULAR_ORIG_COMMIT_REV_ID: fbd356c1f5e9fc537b04d9947aa1d4ab0c103e5f
Replace uses of `StaticTuple` with `InlineArray`.  Soon, `StaticTuple`
will be deprecated.  This requires removing some reg-passable annotations on
types since `StaticTuple` was register passable trivial, but `InlineArray` is of
course not.

As a bonus, this should speed up compile times a little bit since in some cases,
we were using a `StaticTuple` of size `1024` which would not compile very fast.
See modularml#2425.

MODULAR_ORIG_COMMIT_REV_ID: 004b5334e3bd78a8a8054d6762501efc557df716
[External] [stdlib] Support print to stderr

Add keyword argument to `print` function to support stream to stderr.

Fixes modularml#2453.

Signed-off-by: Yun Ding <yunding.eric@gmail.com>

Co-authored-by: GeauxEric <yunding.eric@gmail.com>
Closes modularml#2457
MODULAR_ORIG_COMMIT_REV_ID: 8530deea5047dbea191b6e87f8d113549bf9d121
@JoeLoser JoeLoser requested review from jackos and a team as code owners May 11, 2024 02:45
@JoeLoser JoeLoser merged commit 836626b into modularml:nightly May 11, 2024
5 checks passed
@JoeLoser JoeLoser deleted the 9040375184-1-nightly branch May 11, 2024 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.