Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"timeColumn" option not respected in a "read.dataframe" call #56

Open
LeoDashTM opened this issue Oct 26, 2018 · 1 comment
Open

"timeColumn" option not respected in a "read.dataframe" call #56

LeoDashTM opened this issue Oct 26, 2018 · 1 comment

Comments

@LeoDashTM
Copy link

@icexelloss hi there!

I'm glad the issues are being (pro)actively monitored and attended to, I wasn't expecting that.

Here is one issue I'm facing, it's not a big one, but an inconvenient one:

print( sc.version )
print( tm )

n = df.filter( df['Container'] == 'dbc94d4e3af6' ).select( tm, 'MemPercentG', 'CpuPercentG' )
n.show( truncate = False )
n.printSchema()

from ts.flint import FlintContext, clocks
from ts.flint import utils
  
fc = FlintContext( sqlContext )

r = fc.read \
    .option('isSorted', False) \
    .option('timeUnit', 's') \
    .option('timeColumn', tm) \
    .dataframe( n )

The output is:

2.3.1
TimeStamp
+-------------------+---------------+------------+
|TimeStamp          |MemPercentG    |CpuPercentG |
+-------------------+---------------+------------+
|2018-08-01 05:55:35|0.0030517578125|0.002331024 |
|2018-08-01 05:58:05|0.0030517578125|0.0031538776|
|2018-08-01 05:59:05|0.0030517578125|0.0030176123|
+-------------------+---------------+------------+

root
 |-- TimeStamp: timestamp (nullable = true)
 |-- MemPercentG: double (nullable = true)
 |-- CpuPercentG: float (nullable = true)

IllegalArgumentException: 'Field "time" does not exist.\nAvailable fields: TimeStamp, MemPercentG, CpuPercentG'
---------------------------------------------------------------------------
IllegalArgumentException                  Traceback (most recent call last)
<command-911439891027714> in <module>()
     14 fc = FlintContext( sqlContext )
     15 
---> 16 r = fc.read     .option('isSorted', False)     .option('timeUnit', 's')     .option('timeColumn', tm)     .dataframe( n )
     17 
     18 

/databricks/python/lib/python3.5/site-packages/ts/flint/readwriter.py in dataframe(self, df, begin, end, timezone, is_sorted, time_column, unit)
    362             time_column=time_column,
    363             is_sorted=is_sorted,
--> 364             unit=self._parameters.timeUnitString())
    365 
    366     def parquet(self, *paths):

/databricks/python/lib/python3.5/site-packages/ts/flint/dataframe.py in _from_df(df, time_column, is_sorted, unit)
    248                                    time_column=time_column,
    249                                    is_sorted=is_sorted,
--> 250                                    unit=unit)
    251 
    252     @staticmethod

/databricks/python/lib/python3.5/site-packages/ts/flint/dataframe.py in __init__(self, df, sql_ctx, time_column, is_sorted, unit, tsrdd_part_info)
    133         # throw exception
    134         if time_column in df.columns:
--> 135             self._jdf = self._jpkg.TimeSeriesRDD.canonizeTime(self._jdf, self._junit)
    136 
    137         if tsrdd_part_info:

/databricks/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1255         answer = self.gateway_client.send_command(command)
   1256         return_value = get_return_value(
-> 1257             answer, self.gateway_client, self.target_id, self.name)
   1258 
   1259         for temp_arg in temp_args:

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
     77                 raise QueryExecutionException(s.split(': ', 1)[1], stackTrace)
     78             if s.startswith('java.lang.IllegalArgumentException: '):
---> 79                 raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
     80             raise
     81     return deco

IllegalArgumentException: 'Field "time" does not exist.\nAvailable fields: TimeStamp, MemPercentG, CpuPercentG'

Is this reproducible for you?

Please, advise, if I'm not using/calling it correctly or if it's a bug.

The flint libraries (the Scala and the Python ones) I installed on DataBricks via its UI (from the respective online repos, which might be dated) - I can try and install the latest builds from the freshest source code, if you think that will help.

Thanks.

@LeoDashTM LeoDashTM changed the title timeColumn option not respected in a read.dataframe call "timeColumn" option not respected in a "read.dataframe" call Oct 26, 2018
@icexelloss
Copy link
Member

I suspect that is a bug. Please rename the time column to "time" for the time being (pun intended)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants