Consider adding tabular dataset #19

kachayev · 2023-11-21T16:21:18Z

For example, the one with personal/business flights I've been experimenting with.

It would be nice to have more than CV provided out-of-the box.

[MRG] Use skorch class for deep DA

YanisLalou · 2024-01-16T14:28:35Z

Just to be clear. The goal here is to create a file like _office.py to download/process a tabular dataset, right ?

tgnassou · 2024-01-16T14:32:12Z

Exactly, but I think we want an easy one, which is a SOTA dataset I would say. Maybe we need to check the paper if we can find the more popular one. And in a second time, we will add more complex datasets to a bench_skada repo.

kachayev · 2024-01-16T14:40:47Z

I would say that, first off, we need to pick a suitable tabular dataset for Domain Adaptation (DA). I've had some preliminary results with this dataset: Airline Passenger Satisfaction on Kaggle. It's perfect for our needs because you can easily differentiate between personal and business flights, giving us a clear source vs. target scenario.

Next up, let's create a concise tutorial. The goal here is to demonstrate how the performance of a classifier, trained on one domain, tends to decline when applied to another, and, how to enhance this using DA techniques. At this stage there's no need to worry dataset processing, we're talking about maybe 10-20 lines of code to download and cleanup the dataset.

Once we have this in place, our next step is to package the dataset in a user-friendly way, similar to what we've seen with Office31. This way, it's ready to roll 'out-of-the-box' for anyone installing our library.

Why this order? Well, it's crucial to ensure that our chosen dataset fits DA library needs.

Let me know what do you think.

YanisLalou · 2024-01-16T14:50:57Z

About the dataset choice, at first glance the Airline Passenger Satisfaction on Kaggle one has no license defined, no authors name, no DOI. Thus we don't even know if its open source or not.
At first we wanted to select one of the dataset used in this paper: https://arxiv.org/pdf/2312.07577.pdf
These datasets are all open source and there's also benchmarks with them.
However we havent decided yet which one we're going to add to skada at first. Maybe the one with the most citations? The one who seems to have the best accuracy results in the benchmarks with DA methods?

kachayev · 2024-01-16T14:56:11Z

Oh, interesting. I haven't seen this paper yet. Were you able to re-run their experiments to verify results?

YanisLalou · 2024-01-16T15:09:22Z

I don't think we've tried to reproduce results and don't know if we plan to do it

tgnassou · 2024-01-16T15:25:23Z

It is a distribution shift tabular dataset, but they don't use any domain adaptation method in their benchmark :( So, I didn't try to reproduce the code. But it will be interesting for the benchmark

kachayev · 2024-01-17T08:58:39Z

they don't use any domain adaptation method in their benchmark

Yeah... whichever dataset we choose, it's essential to ensure that we can showcase the use of DA methods.

tgnassou added a commit that referenced this issue Nov 22, 2023

Merge pull request #19 from tgnassou/skorch

de7e30f

[MRG] Use skorch class for deep DA

YanisLalou mentioned this issue Jan 17, 2024

[WIP] Tabular dataset + Doc for testing the advantage of using DA techniques #59

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding tabular dataset #19

Consider adding tabular dataset #19

kachayev commented Nov 21, 2023

YanisLalou commented Jan 16, 2024

tgnassou commented Jan 16, 2024

kachayev commented Jan 16, 2024

YanisLalou commented Jan 16, 2024

kachayev commented Jan 16, 2024

YanisLalou commented Jan 16, 2024

tgnassou commented Jan 16, 2024

kachayev commented Jan 17, 2024

Consider adding tabular dataset #19

Consider adding tabular dataset #19

Comments

kachayev commented Nov 21, 2023

YanisLalou commented Jan 16, 2024

tgnassou commented Jan 16, 2024

kachayev commented Jan 16, 2024

YanisLalou commented Jan 16, 2024

kachayev commented Jan 16, 2024

YanisLalou commented Jan 16, 2024

tgnassou commented Jan 16, 2024

kachayev commented Jan 17, 2024