Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Type not implemented or supported' exception message from TextLoader is not descriptive #128

Closed
v-tsymbalistyi opened this issue May 11, 2018 · 14 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@v-tsymbalistyi
Copy link
Contributor

Issue

  • I added ml package for the first time and tried to customize a tutorial a bit
  • I got 'Type not implemented or supported' exception. I did not understand what type is not supported and wasted a lot of time and ended up on github, looking for answers
  • I would like to know, what type is actually not supported yet. It would simplify a learning curve and save me some nerves.
@shauheen shauheen added the enhancement New feature or request label May 11, 2018
@shauheen shauheen added this to the 0518 milestone May 11, 2018
@remware
Copy link

remware commented May 11, 2018

Will datetime type be supported at some point ?

@zeahmed
Copy link
Contributor

zeahmed commented May 11, 2018

@v-tsymbalistyi, Currently string, float and bool are supported. Can you please confirm if you used other than these type that cause this exception? if not it would be good if you can share your sample?

@remware, can you please share any use case where datatime type will be useful? (just to make a case to support it)

@remware
Copy link

remware commented May 11, 2018

oh, yes I would need DateTime and int if possible. As of now DateTime would be useful.
https://github.com/ExtensiveLifeOy/statePred/blob/master/feedbacks-19.txt

In my example if I input a value of 0.3 the system predicts different states in different runs so probably the values are quite close. Do you have a pointer to other trainer models?

@zeahmed
Copy link
Contributor

zeahmed commented May 11, 2018

@remware, Thanks for sharing example.

I mean how will you use datetime type in a learning algorithm. All the machine learning algorithms only operates on numbers. Some learning tasks only depend on year so in this case year part is extracted as a feature from the datetime field or some learning task require day, month, year as feature in that case these three are extracted from the datetime field.

So, its not just supporting datetime type but also supporting appropriate transform that can operate on this datetime type and extract features from that.

I wanted know if you have any end-to-end case where datetime will be read as proper type, how exactly you transform datatime into some features and used in a learning algorithm?

PS: most of the time when data depends on time, the problem become time-series then simply a classification or regression problem.

@zeahmed
Copy link
Contributor

zeahmed commented May 11, 2018

In my example if I input a value of 0.3 the system predicts different states in different runs so probably the values are quite close. Do you have a pointer to other trainer models?

what leaner are you using?

@v-tsymbalistyi
Copy link
Contributor Author

@zeahmed Same for me. I tried to use DateTime and int as well.
It would be nice to have nullable versions of those.

I can live without it for now. I just couldn't figure out what was going on at the start. That is why I decided to create this issue.

@glebuk glebuk closed this as completed in 3780923 May 11, 2018
@remware
Copy link

remware commented May 11, 2018

I guess the DateTime use case is bit complex. So we need something like when we mark a date, all the dates consecutive to marker are considered as state dependent. Not sure yet how to do the transformation but that would be next step. At the moment I am using StochasticDualCoordinateAscentClassifier but for my tests I would need Naive Bayes, Clojure, A1DE and MLP Classifier.

Also notice that the predicted column in data model is always "Label" is this designed on purpose or can I use other name there ? Assuming you sync also the name in Dictionarizer

@zeahmed
Copy link
Contributor

zeahmed commented May 11, 2018

StochasticDualCoordinateAscentClassifier is stochastic as name indicates. It initializes values through a stochastic process at the start similar to other linear algorithms. So, prediction are expected to be a little bit off after every training run. To get the deterministic results, there is a seed value parameter that is not exposed yet. The issue #9 has already been open against it.

Here are other learners that you can try:

  1. Microsoft.ML.Trainers.NaiveBayesClassifier
  2. Microsoft.ML.Trainers.LogisticRegressor

Once issue #34 is resolved you would be able to use bunch of binary classier for multi-class classification case.

@v-tsymbalistyi
Copy link
Contributor Author

Great news @zeahmed
Thank you guys for all the work you are doing!

@raghumuttana
Copy link

raghumuttana commented May 28, 2018

(Correcting previous question: This issue is only when I have int). Why is it I should always use float/double?

I had a simple row of data with headers in the csv as follows:
Year,Month,Day
2013,12,1

When I use the following line of code:
var pipeline = new LearningPipeline();
pipeline.Add(new TextLoader< FlightData > (@"C:\Code\ML\ConsoleApp1\Test.csv", useHeader: false, separator: ",", allowQuotedStrings: true, supportSparse: false,
trimWhitespace:true));

To make sure the file Test.csv is being read properly, I used
var text = System.IO.File.ReadAllLines(@"C:\Code\ML\ConsoleApp1\Test.csv"); and no issues in reading file. However, I get the error as follows at pipeline.Add(new TextLoader<...)

I get:

{System.Exception: Type not implemented or supported.
at Microsoft.ML.TextLoader1.TypeToName(Type type) at Microsoft.ML.TextLoader1.SetCustomStringFromType(Boolean useHeader, String separator, Boolean allowQuotedStrings, Boolean supportSparse, Boolean trimWhitespace)
at ConsoleApp1.Program.Main(String[] args)}

@zeahmed
Copy link
Contributor

zeahmed commented May 30, 2018

@raghumuttana, Just looking at your example, I see that your file has header. However, in TextLoader you are setting "useHeader: false". It should be "true". I hope that will fix the issue.

Otherwise, please post your code to get deeper understanding of your issue.

@helloguo
Copy link

helloguo commented Jun 4, 2018

Currently string, float and bool are supported

@zeahmed , do you have any idea which types are going to be supported eventually?

@zeahmed
Copy link
Contributor

zeahmed commented Jun 4, 2018

Here is the list of types currently supported.

private static bool TryGetDataKind(Type type, out DataKind kind)

@remware
Copy link

remware commented Jun 5, 2018

What about using string as feature? I try with Dictionarizer but got an error. Is there any way to assign values to the ocurrence of certaing strings in a text field? Was thinking to do kind of sentiment analysis but for issues reported so I could assign automatically certain bugs/issues to corresponding team or component. Is that possible?

eerhardt pushed a commit to eerhardt/machinelearning that referenced this issue Jul 27, 2018
Make a 'not supported field type' exception more readable, so the developer could figure out why he can't load the data
This closes dotnet#128
@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants