-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fastest way to check is and object is int or float in one pass #36
Comments
Thanks for the fast response @SethMMorton Maybe is something niche but it will be helpful for filtering data types in columns |
To be pedantic, No, there currently is no functionality for that directly. You could do |
Thanks. I am just trying to gain the maximum speed I can get :) |
If this is something you think would be useful, I would be very open to a PR. I don't think it would require adding any new algorithms, just a new top-level function utilizing existing code. |
Thanks, @SethMMorton, At the moment I have neither the bandwidth or C knowledge to tackle this, but I will be more than happy to collaborate in the future if I can not get other options work. |
Is the suggestion I made usable, or is this something you need? I'm curious, what application are you needing this for? |
Thanks @SethMMorton I am the main developer in Bumblebee https://github.com/ironmussa/Bumblebee/tree/develop-3.0 The problem with the default approach in Dask is that it loads the data in chunks and tries to inter the datatype in every chunk. Sometimes it fails because every chunk results in different data types. The final goal is to reduce the memory usage, casting to the data type that better represents the data. |
Doesn't pandas auto-infer the datatype for you? Or are you trying to infer the type of your dataset before inserting into the dataframe? Either way, I can see the value of a function to tell the type, not just answer "is this a particular type"? I think I was a bit thrown off by the specificity of Rough python equivalent put in terms of existing fastnumbers functionality: from fastnumbers import isint, isfloat
def detect_type(x):
if isint(x):
return int
elif isfloat(x):
return float
else:
return None Open questions:
|
Yes, pandas can infer the datatype. The problem is Dask because it is inferring the datatype in every chunk of data. The code you wrote is exactly what I am doing right now :) If given a string that is non-numeric, should it return None as shown above, or return str? If given something completely crazy, like a list, should it return None, list, or raise a TypeError? |
@argenisleon I have created a PR for this at #38. Can you please review? At the very least, please review the following: |
Sure @SethMMorton , I will be reviewing this today |
Closed by #38 |
FYI - this was released as part of fastnumbers 3.1.0 |
Hi,
Is there a way to check if an object is an int or float in one pass?
Right now I am using
.isint
and.isfloat
.Any help?
The text was updated successfully, but these errors were encountered: