Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explorer new dataset format #682

Merged
merged 11 commits into from
Nov 28, 2023
Merged

Conversation

PhilippeMoussalli
Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli commented Nov 27, 2023

PR that changes the explorer to match the new dataset format.

image

There is some weird refersh error that is occurring which might get resolved/ will be easier to debug after removing the partitions.
Let's focus on first getting this PR merged since it might fix the issue

@RobbeSneyders
Copy link
Member

Thanks @PhilippeMoussalli!

Ran into some issues when testing:

  • When trying to select a pipeline run on the first page, I get the following error:

    File "/home/robbe/.cache/pypoetry/virtualenvs/fondant-n6_n8sMX-py3.10/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 534, in _run_script
        exec(code, module.__dict__)
    File "/home/robbe/workspace/fondant/data_explorer/app/main.py", line 189, in <module>
        app.setup_app_page()
    File "/home/robbe/workspace/fondant/data_explorer/app/main.py", line 165, in setup_app_page
        selected_run_update_date = selected_run_info["Last Updated"].to_dict()[0]
    KeyError: 0
    

    It does work when it's first loaded.

  • The dataset explorer flickers a couple times when first loaded, and after anything I do on the page
    fondant-explorer-bug-flickering

  • If I refresh on any page except for the overview page, the page becomes narrow:
    fondant-explorer-bug-narrow

  • I don't have images in my dataset, which leads to an error in the image viewer:
    image

  • I also get an error on the numeric analysis page. I don't have numeric columns, but I don't think that's the reason:
    image

@PhilippeMoussalli
Copy link
Contributor Author

Thanks @PhilippeMoussalli!

Ran into some issues when testing:

Yes I ran into a few of those as well, I introduced new updates to fix them:

- Flickering
Was due to the fact that the pandas dataframe kept reloading everytime and merging, fixed this by caching the function that loads it. We're using the mapping fields dict as the cache key in this case (not possible to cache dask dataframe object)

The rest of the small issues were also handled (pipeline visualization, image gallery, numerical analysis, fixed screen size). I fixed other issues as well to better navigate the datasets. Let me know if you run into any more issues

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, seems to work without issues now.

One downside to the global pagination is that all the built-in filtering, sorting, etc. now only works on the 20 loaded rows. Not sure if this is an issue.

@PhilippeMoussalli
Copy link
Contributor Author

Thanks, seems to work without issues now.

One downside to the global pagination is that all the built-in filtering, sorting, etc. now only works on the 20 loaded rows. Not sure if this is an issue.

Yeah I think before it was also enabled per individual loaded partitions. The global search should partially solve this.

We can decide to completely omit the build it in filtering if having two search functionalities might seem confusing

@PhilippeMoussalli PhilippeMoussalli merged commit 6a84677 into main Nov 28, 2023
6 checks passed
@PhilippeMoussalli PhilippeMoussalli deleted the explorer-new-dataset-format branch November 28, 2023 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants