Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support column names starting with numbers #365

Closed
andyndang opened this issue Sep 30, 2022 · 0 comments · Fixed by #366
Closed

[FEATURE] Support column names starting with numbers #365

andyndang opened this issue Sep 30, 2022 · 0 comments · Fixed by #366
Labels
enhancement New feature or request
Milestone

Comments

@andyndang
Copy link

Minimal Code To Reproduce

using

import pandas as pd
tdf = pd.DataFrame(
    {
        "1_col": [1],
    }
)

def no_op(df: pd.DataFrame) -> pd.DataFrame:
    return df



from fugue import transform
result = transform(
    df=tdf,
    using=no_op,
    schema="*",
)```

**Describe the bug**

_0 _State.RUNNING -> _State.FAILED Invalid column name 1_col

SchemaError Traceback (most recent call last)
/var/folders/c_/jg8n22hx1qd8lgmv955hs05m0000gr/T/ipykernel_43335/3739077820.py in
15 df=tdf,
16 using=no_op,
---> 17 schema="*",
18 )

~/opt/miniconda3/envs/pyspark2.4/lib/python3.7/site-packages/fugue/interfaceless.py in transform(df, using, schema, params, partition, callback, ignore_errors, engine, engine_conf, force_output_fugue_dataframe, persist, as_local, save_path, checkpoint)
165 else:
166 tdf.save(save_path, fmt="parquet")
--> 167 dag.run(engine, conf=engine_conf)
168 if checkpoint:
169 result = dag.yields["result"].result # type:ignore

~/opt/miniconda3/envs/pyspark2.4/lib/python3.7/site-packages/fugue/workflow/workflow.py in run(self, *args, **kwargs)
1521 if ctb is None: # pragma: no cover
1522 raise
-> 1523 raise ex.with_traceback(ctb)
1524 self._computed = True
1525 return DataFrames(

~/opt/miniconda3/envs/pyspark2.4/lib/python3.7/site-packages/triad/collections/schema.py in setitem(self, name, value, *args, **kwds)
182 assert_arg_not_none(value, "value")
183 if not validate_column_name(name):
--> 184 raise SchemaError(f"Invalid column name {name}")
185 if name in self: # update existing value is not allowed
186 raise SchemaError(f"{name} already exists in {self}")

SchemaError: Invalid column name 1_col


**Expected behavior**
Expect to support columns starting with numbers.

**Environment (please complete the following information):**
 - Backend: pandas
 - Backend version: 
 - Python version: 3.7
 - OS: linux/windows: linux
@andyndang andyndang changed the title [BUG] Fugue doesn't support column name starting with numbers [FEATURE] Support column names starting with numbers Sep 30, 2022
@goodwanghan goodwanghan added this to the 0.7.3 milestone Oct 2, 2022
@goodwanghan goodwanghan added the enhancement New feature or request label Oct 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants