df2sd slow #499
-
I am trying to save the dataframe as sas dataset using df2sd function. I also provided a dictionary containing the names and lengths of all of the character columns. which eliminates running the code to calculate the lengths, and goes straight to transferring the data but even after which it end up taking about 30 mins to transfer dataframe of shape 3000 rows and 1300 columns around 10 mb in size. Using below code sas=saspySASsession(cfgname='winlocal') Not sure why it takes around 30 mins to read file with size 10mb and shape 3000rows and 1300 columns. |
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 1 reply
-
Wow, that's slow. Can you show me your configuration, as well as submit your SASsession object to see info from it (submit 'sas', or if running in batch; print(sas)). Out of curiosity, how long does the sas.df_char_lengths(df) take? |
Beta Was this translation helpful? Give feedback.
-
Ok, thanks. Looks like you have the latest saspy, and are using IOM, which is correct. You should remove the classpath from that config; it's been no longer needed for some time now (year or more). Not that that would be making it slow, but just to eliminate other issues. The next thing to see would be the SASLOG from the sd2sd(). Can you print(sas.lastlog()) upon completion of the df2sd() method so we can see if there's anything in there that could provide info? You can also do print(sas.saslog()) for the whole session log. Can you also provide the dict |
Beta Was this translation helpful? Give feedback.
-
Is there something specific which you are looking in saslogs...I am
doubtful about whether can get you saslogs and D1 due to organization data
restrictions but will give it a try
…On Wed, Dec 14, 2022, 21:26 Tom Weber ***@***.***> wrote:
Ok, thanks. Looks like you have the latest saspy, and are using IOM, which
is correct. You should remove the classpath from that config; it's been no
longer needed for some time now (year or more). Not that that would be
making it slow, but just to eliminate other issues. The next thing to see
would be the SASLOG from the sd2sd(). Can you print(sas.lastlog()) upon
completion of the df2sd() method so we can see if there's anything in there
that could provide info? You can also do print(sas.saslog()) for the whole
session log.
Can you also provide the dict d1 so I can see what the dataframe looks
like? Maybe I can try to simulate this.
—
Reply to this email directly, view it on GitHub
<#499 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A4JLV7QB3XCYYHPAZG4TE4DWNHU2JANCNFSM6AAAAAAS6SSFGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
No problem, I understand about that. FWIW, You can just direct email me info like that, instead of posting it. Since I'm SAS, it's the same NDA for that kind of info as you would get with our Tech Support folks. So there shouldn't be any issue with that. As for the log; it's SAS, so any time something happens that isn't expected, looking at the log to see what you see is one of the first things to do. |
Beta Was this translation helpful? Give feedback.
-
I see that the version of SAS you're connecting to is over 8 years old. So, I'm trying to run against that version myself. The version of M2 I can run is a slightly newer patch level then yours too, but I'm seeing that it's taking much longer than a few minutes to run. So, once this finishes, I'll look at the log and see if there's anything different than what I saw in my previous attempt which was against a more current version of SAS. Well, it took just over 12 minutes to run the same job w/ M2. Nothing was in the log, as far as anything not behaving as expected. It is just a bunch slower. So, there must have been some significant performance improvements in the data step since then. The version of SAS I happened to be running against in my previous run (2.5 minutes) was M5, which is also a bit old - 5 years old, but has much better performance. Since there was nothing 'wrong' in the M2 case; the code works as expected, it seems to just be SAS itself. Trying against M7, our current production release, it took 1.6 minutes. M8 is supposed to be coming out early next year. But it seems anything reasonably new will perform much better. I haven't tried M4/M5, but that's moot; you wouldn't upgrade to that when you should get the current production version. Since it costs nothing to get the current version of SAS, is there any way you can just install a current version on your pc to address this? It doesn't seem to be something that can be addressed in saspy itself. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Ok..let me check if I can get the sas upgraded...also does using pycharm
instead of Jupyter also result in slowness?
…On Wed, Dec 14, 2022, 23:47 Tom Weber ***@***.***> wrote:
I see that the version of SAS you're connecting to is over 8 years old.
So, I'm trying to run against that version myself. The version of M2 I can
run is a slightly newer patch level then yours too, but I'm seeing that
it's taking much longer than a few minutes to run. So, once this finishes,
I'll look at the log and see if there's anything different than what I saw
in my previous attempt which was against a more current version of SAS.
Well, it took just over 12 minutes to run the same job w/ M2. Nothing was
in the log, as far as anything not behaving as expected. It is just a bunch
slower. So, there must have been some significant performance improvements
in the data step since then. The version of SAS I happened to be running
against in my previous run (2.5 minutes) was M5, which is also a bit old -
5 years old, but has much better performance. Since there was nothing
'wrong' in the M2 case; the code works as expected, it seems to just be SAS
itself. Trying against M7, our current production release, it took 1.6
minutes. M8 is supposed to be coming out early next year. But it seems
anything reasonably new will perform much better. I haven't tried M4/M5,
but that's moot; you wouldn't upgrade to that when you should get the
current production version.
Since it costs nothing to get the current version of SAS, is there any way
you can just install a current version on your pc to address this? It
doesn't seem to be something that can be addressed in saspy itself.
[image: image]
<https://user-images.githubusercontent.com/17710182/207667616-2f92926e-eb37-4334-abaa-b96c3ee7a14e.png>
Thanks!
Tom
—
Reply to this email directly, view it on GitHub
<#499 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A4JLV7TGT4UN4HGXDGHDXTLWNIFLLANCNFSM6AAAAAAS6SSFGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Good deal. No, what Python interface you use has nothing to do with it. I run in line mode mostly, in a terminal session. Same results. It's not about the Python interface.
while this is M2:
|
Beta Was this translation helpful? Give feedback.
Good deal. No, what Python interface you use has nothing to do with it. I run in line mode mostly, in a terminal session. Same results. It's not about the Python interface.
This is M8:
while this is M2: