Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Python documentation #123

Merged
merged 1 commit into from
Aug 8, 2024
Merged

Add Python documentation #123

merged 1 commit into from
Aug 8, 2024

Conversation

linhr
Copy link
Contributor

@linhr linhr commented Aug 8, 2024

Changes:

  1. Set up Python documentation using Sphinx.
  2. Integrate Sphinx output with VitePress.

With this PR, the Python documentation can be part of the SPA (single-page application) documentation site. I haven't seen Python libraries doing this. I feel this is a valuable approach that offers good user experience for browsing the documentation.

Additional Notes

I spent some time investigating the Python documentation setup, and it turns out that the tooling is not satisfactory. Here are a few possible options for now.

Option 1: Write Python docstring as Rust doc comments, and optionally write .pyi files for typing.

  • Sphinx can generate the doc since it can import extension modules and read docstring from the __doc__ attribute. (PyO3 stores Rust doc comments in the __doc__ attribute.)
  • Rust doc comments are expected to use the Markdown syntax, while Python documentation typically uses reStructuredText (reST). So having Python docstring in Rust may confuse the Rust IDE.
  • The Python IDE may not be able to read docstring from extension modules.

Option 2: Write .pyi files for docstring and comment.

  • Sphinx does not support .pyi yet. See the discussions here and here.
  • The Python IDE may be able to read docstring and typing, if it understands .pyi files.

Option 3: Use Python wrapper functions/classes for all Rust implementations.

  • There is no need to write separate .pyi files.
  • We can simply write docstring in the wrapper.
  • The Python IDE understands docstring and typing easily, since the wrapper is pure Python code.
  • The wrapper results in more code to maintain.

Notes:

  1. PyO3 may support generating .pyi files automatically in the future. See the issue for more details.
  2. For Option 1 and Option 2, manually maintained .pyi files can be validated against the implementation using stubtest.
  3. Option 3 is the approach taken by polars, and recently by DataFusion Python (reference).

Here is a summary of all the options.

Metric Option 1 Option 2 Option 3
Sphinx support Yes No Yes
Idiomatic usage of Rust/Python comments No Yes Yes
Python IDE support Bad Ok Good
Maintanence cost Low High High

In this PR I decided to use Option 3. It has higher maintainence cost but good user experience. The maintainence cost is acceptable given that we do not have many Python functions/classes.

Copy link

github-actions bot commented Aug 8, 2024

Gold Data Report

Notes
  1. The tables below show the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) in gold data input processing.
  2. A positive input is a valid test case, while a negative input is a test case that is expected to fail.

Commit Information

  • Head: HEAD (f80d72072c24fbc582b28f3fc7159d8ed5e5ccf4)
  • Base: main (f99324cc1a14554104cd3dd79a19494d12420b27)

Summary

Commit TP TN FP FN Total
Head 1068 190 22 950 2230
Base 1068 190 22 950 2230

Details

Gold Data Metrics
Group File Commit TP TN FP FN Total
spark data_type.json Head 42 5 0 0 47
Base 42 5 0 0 47
expression/case.json Head 5 0 0 0 5
Base 5 0 0 0 5
expression/cast.json Head 4 0 0 0 4
Base 4 0 0 0 4
expression/current.json Head 2 0 0 0 2
Base 2 0 0 0 2
expression/date.json Head 4 1 0 0 5
Base 4 1 0 0 5
expression/interval.json Head 340 6 0 6 352
Base 340 6 0 6 352
expression/large.json Head 0 0 0 2 2
Base 0 0 0 2 2
expression/like.json Head 29 0 10 0 39
Base 29 0 10 0 39
expression/misc.json Head 81 4 1 26 112
Base 81 4 1 26 112
expression/numeric.json Head 29 7 0 2 38
Base 29 7 0 2 38
expression/string.json Head 16 1 0 2 19
Base 16 1 0 2 19
expression/timestamp.json Head 7 3 0 0 10
Base 7 3 0 0 10
expression/window.json Head 72 0 1 1 74
Base 72 0 1 1 74
function/agg.json Head 73 0 0 61 134
Base 73 0 0 61 134
function/array.json Head 14 0 0 28 42
Base 14 0 0 28 42
function/bitwise.json Head 4 0 0 7 11
Base 4 0 0 7 11
function/collection.json Head 0 0 0 9 9
Base 0 0 0 9 9
function/conditional.json Head 3 0 0 7 10
Base 3 0 0 7 10
function/conversion.json Head 1 0 0 0 1
Base 1 0 0 0 1
function/csv.json Head 0 0 0 5 5
Base 0 0 0 5 5
function/datetime.json Head 16 0 0 108 124
Base 16 0 0 108 124
function/generator.json Head 4 0 0 9 13
Base 4 0 0 9 13
function/hash.json Head 1 0 0 6 7
Base 1 0 0 6 7
function/json.json Head 0 0 0 20 20
Base 0 0 0 20 20
function/lambda.json Head 0 0 0 31 31
Base 0 0 0 31 31
function/map.json Head 1 0 0 14 15
Base 1 0 0 14 15
function/math.json Head 45 0 0 75 120
Base 45 0 0 75 120
function/misc.json Head 0 0 0 48 48
Base 0 0 0 48 48
function/predicate.json Head 49 0 0 20 69
Base 49 0 0 20 69
function/string.json Head 31 0 0 142 173
Base 31 0 0 142 173
function/struct.json Head 1 0 0 1 2
Base 1 0 0 1 2
function/url.json Head 0 0 0 5 5
Base 0 0 0 5 5
function/window.json Head 6 0 0 3 9
Base 6 0 0 3 9
function/xml.json Head 0 0 0 9 9
Base 0 0 0 9 9
plan/ddl_alter_table.json Head 0 17 0 62 79
Base 0 17 0 62 79
plan/ddl_alter_view.json Head 0 1 0 5 6
Base 0 1 0 5 6
plan/ddl_analyze_table.json Head 0 6 0 16 22
Base 0 6 0 16 22
plan/ddl_cache.json Head 0 1 0 4 5
Base 0 1 0 4 5
plan/ddl_create_index.json Head 0 0 0 3 3
Base 0 0 0 3 3
plan/ddl_create_table.json Head 14 23 3 11 51
Base 14 23 3 11 51
plan/ddl_delete_from.json Head 0 1 0 2 3
Base 0 1 0 2 3
plan/ddl_describe.json Head 0 0 0 4 4
Base 0 0 0 4 4
plan/ddl_drop_index.json Head 0 0 0 2 2
Base 0 0 0 2 2
plan/ddl_drop_view.json Head 5 0 0 0 5
Base 5 0 0 0 5
plan/ddl_insert_into.json Head 12 2 0 4 18
Base 12 2 0 4 18
plan/ddl_insert_overwrite.json Head 5 2 0 4 11
Base 5 2 0 4 11
plan/ddl_load_data.json Head 0 0 0 4 4
Base 0 0 0 4 4
plan/ddl_merge_into.json Head 0 7 0 8 15
Base 0 7 0 8 15
plan/ddl_misc.json Head 0 0 0 13 13
Base 0 0 0 13 13
plan/ddl_replace_table.json Head 0 18 0 22 40
Base 0 18 0 22 40
plan/ddl_select.json Head 1 0 0 0 1
Base 1 0 0 0 1
plan/ddl_show_views.json Head 0 0 0 7 7
Base 0 0 0 7 7
plan/ddl_uncache.json Head 0 0 0 2 2
Base 0 0 0 2 2
plan/ddl_update.json Head 0 1 0 2 3
Base 0 1 0 2 3
plan/error_alter_table.json Head 0 4 0 0 4
Base 0 4 0 0 4
plan/error_analyze_table.json Head 0 1 0 0 1
Base 0 1 0 0 1
plan/error_create_table.json Head 0 3 0 0 3
Base 0 3 0 0 3
plan/error_describe.json Head 0 1 0 0 1
Base 0 1 0 0 1
plan/error_join.json Head 0 2 0 0 2
Base 0 2 0 0 2
plan/error_load_data.json Head 0 1 0 0 1
Base 0 1 0 0 1
plan/error_misc.json Head 0 11 0 0 11
Base 0 11 0 0 11
plan/error_order_by.json Head 1 3 0 0 4
Base 1 3 0 0 4
plan/error_select.json Head 0 10 0 0 10
Base 0 10 0 0 10
plan/error_with.json Head 0 1 0 0 1
Base 0 1 0 0 1
plan/plan_alter_view.json Head 0 2 0 0 2
Base 0 2 0 0 2
plan/plan_create_view.json Head 0 2 0 0 2
Base 0 2 0 0 2
plan/plan_explain.json Head 0 1 1 0 2
Base 0 1 1 0 2
plan/plan_group_by.json Head 2 1 0 8 11
Base 2 1 0 8 11
plan/plan_hint.json Head 25 0 3 0 28
Base 25 0 3 0 28
plan/plan_insert_into.json Head 3 0 0 0 3
Base 3 0 0 0 3
plan/plan_insert_overwrite.json Head 1 0 0 1 2
Base 1 0 0 1 2
plan/plan_join.json Head 39 3 0 20 62
Base 39 3 0 20 62
plan/plan_misc.json Head 6 6 0 16 28
Base 6 6 0 16 28
plan/plan_order_by.json Head 2 3 0 13 18
Base 2 3 0 13 18
plan/plan_select.json Head 40 17 2 55 114
Base 40 17 2 55 114
plan/plan_set_operation.json Head 14 0 0 3 17
Base 14 0 0 3 17
plan/plan_with.json Head 0 1 0 5 6
Base 0 1 0 5 6
plan/unpivot_join.json Head 4 0 0 0 4
Base 4 0 0 0 4
plan/unpivot_select.json Head 7 6 0 7 20
Base 7 6 0 7 20
table_schema.json Head 7 5 1 0 13
Base 7 5 1 0 13

Copy link

github-actions bot commented Aug 8, 2024

Spark Test Report

Commit Information

  • Head: f80d720 (refs/pull/123/merge)
  • Base: f99324c (f99324cc1a14554104cd3dd79a19494d12420b27)

Test Summary

Suite Commit Failed Passed Skipped Warnings Time (s)
doctest-column Head 10 23 2 5.49
Base 10 23 2 5.21
doctest-dataframe Head 64 42 1 3 6.46
Base 63 43 1 3 6.31
doctest-functions Head 322 81 6 7 9.41
Base 322 81 6 7 8.74
test-connect Head 484 554 126 243 93.73
Base 484 554 126 243 91.58

Test Details

Error Counts
(+1)      880 Total
          417 Total Unique
-------- ---- ----------------------------------------------------------------------------------------------------------
(+1)       53 DocTestFailure
           27 UnsupportedOperationException: map partitions
           23 UnsupportedOperationException: co-group map
           23 UnsupportedOperationException: group map
           17 PySparkAssertionError: [DIFFERENT_PANDAS_DATAFRAME] DataFrames are not almost equal:
           15 UnsupportedOperationException: streaming query manager command
           13 UnsupportedOperationException: inline user defined window function
           13 UnsupportedOperationException: lambda function
           12 UnsupportedOperationException: function: assert_true
           12 UnsupportedOperationException: function: randn
           10 UnsupportedOperationException: unsupported data source format: Some("text")
            9 handle add artifacts
            8 PythonException: 
            8 UnsupportedOperationException: handle analyze tree string
            8 UnsupportedOperationException: hint
            7 AnalysisException: Error during planning: Table function 'range' not found
            7 AssertionError: AnalysisException not raised
            7 AssertionError: False is not true
            7 SparkRuntimeException: type_coercion
            6 IllegalArgumentException: invalid argument: UDF function type must be Python UDF
            6 PythonException:  KeyError: 0
            6 UnsupportedOperationException: function in table factor
            6 UnsupportedOperationException: update fields
            6 UnsupportedOperationException: write stream operation start
            5 AnalysisException: Error during planning: cannot resolve attribute: ObjectName([Identifier("name")])
            5 IllegalArgumentException: invalid argument: expecting either join condition or using columns
            5 UnsupportedOperationException: corr
            5 UnsupportedOperationException: fill na
            5 UnsupportedOperationException: function: monotonically_increasing_id
            4 AnalysisException: No field named "#3". Valid fields are "#0".
            4 PySparkNotImplementedError: [NOT_IMPLEMENTED] rdd() is not implemented.
            4 UnsupportedOperationException: cov
            4 UnsupportedOperationException: drop na
            4 UnsupportedOperationException: function: to_binary
            4 UnsupportedOperationException: function: window
            4 UnsupportedOperationException: interval unit: day-time
            4 UnsupportedOperationException: replace
            4 UnsupportedOperationException: sample
            4 UnsupportedOperationException: sample by
            4 UnsupportedOperationException: unknown function: hll_sketch_agg
            4 UnsupportedOperationException: unpivot
            3 AnalysisException: Error during planning: Inconsistent data type across values list at row 2 column ...
            3 AnalysisException: Error during planning: The expression to get an indexed field is only valid for `...
            3 AnalysisException: Error during planning: cannot resolve attribute: ObjectName([Identifier("id")])
            3 AnalysisException: Error during planning: cannot resolve attribute: ObjectName([Identifier("k")])
            3 AnalysisException: Execution error: Date part 'D' not supported
            3 AssertionError: "[('a', [('b', 'c')])]" != "{'a': {'b': 'c'}}"
            3 AssertionError: PythonException not raised
            3 IllegalArgumentException: expected value at line 1 column 1
            3 PythonException:  ArrowException: Invalid argument error: must either specify a row count or at leas...
            3 SparkRuntimeException: External error: PySparkRuntimeError: [STOP_ITERATION_OCCURRED] Caught StopIte...
            3 SparkRuntimeException: Internal error: UDF returned a different number of rows than expected. Expect...
            3 SparkRuntimeException: Internal error: contains should only be called with two string arguments, got...
            3 SparkRuntimeException: Object Store error: Object at location /home/runner/work/sail/sail/python/tes...
            3 UnsupportedOperationException: crosstab
            3 UnsupportedOperationException: expression: Tuple([Value(SingleQuotedString("value")), Value(Number("...
            3 UnsupportedOperationException: function: base64
            3 UnsupportedOperationException: function: hypot
            3 UnsupportedOperationException: function: length
            3 UnsupportedOperationException: function: named_struct
            3 UnsupportedOperationException: function: to_date
            3 UnsupportedOperationException: function: unbase64
            3 UnsupportedOperationException: function: ~
            3 UnsupportedOperationException: handle analyze input files
            3 UnsupportedOperationException: to schema
            3 ValueError: Converting to Python dictionary is not supported when duplicate field names are present
            2 AnalysisException: Cannot cast from struct to other types except struct
            2 AnalysisException: Cannot cast list to non-list data types
            2 AnalysisException: Cannot cast to Decimal128(14, 7). Overflowing on NaN
            2 AnalysisException: Error during planning: Error during planning: Coercion from [Int64, Boolean] to t...
            2 AnalysisException: Error during planning: failed to resolve schema: global_temp
            2 AnalysisException: Error during planning: two values expected: [Alias(Alias { expr: Column(Column { ...
            2 AnalysisException: Error during planning: zero values expected: [Literal(Int32(0))]
            2 AssertionError
            2 AssertionError: "TABLE_OR_VIEW_NOT_FOUND" does not match "Error during planning: cannot resolve attr...
            2 AssertionError: Lists differ: [Row([15 chars]lue=1.0)] != [Row([15 chars]lue=1), Row(key='count', va...
            2 AssertionError: Lists differ: [Row(value='0'), Row(value='1'), Row(value='10[1639 chars]99')] != [Ro...
            2 AssertionError: True is not false
            2 FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/sail/sail/.venvs/test/lib...
            2 FileNotFoundError: [Errno 2] No such file or directory: '/home/runner/work/sail/sail/.venvs/test/lib...
            2 IllegalArgumentException: invalid argument: empty data source paths
            2 IllegalArgumentException: invalid argument: sql parser error: Expected: ), found: id at Line: 1, Col...
            2 IllegalArgumentException: invalid argument: sql parser error: Expected: ), found: id at Line: 1, Col...
            2 IllegalArgumentException: invalid argument: sql parser error: Expected: ), found: id at Line: 3, Col...
            2 IllegalArgumentException: invalid argument: sql parser error: Expected: ), found: id at Line: 5, Col...
            2 IllegalArgumentException: invalid argument: sql parser error: Expected: ), found: t at Line: 3, Colu...
            2 IllegalArgumentException: invalid argument: sql parser error: Expected: ), found: v at Line: 1, Colu...
            2 IllegalArgumentException: invalid type: integer `99`, expected a string at line 1 column 10
            2 SparkRuntimeException: External error: KeyError: 0
            2 SparkRuntimeException: Object Store error: Object at location /home/runner/work/sail/sail/python/tes...
            2 SparkRuntimeException: PySparkUDTFVisitor Pickle Error: PySparkRuntimeError: [UDTF_EXEC_ERROR] User ...
            2 SparkRuntimeException: PySparkUDTFVisitor Pickle Error: PySparkRuntimeError: [UDTF_EXEC_ERROR] User ...
            2 UnsupportedOperationException: Only literal expr are supported in Python UDTFs for now, got expr: ma...
            2 UnsupportedOperationException: Only literal expr are supported in Python UDTFs for now, got expr: ma...
            2 UnsupportedOperationException: Physical plan does not support logical expression SimilarTo(Like { ne...
            2 UnsupportedOperationException: PlanNode::IsCached
            2 UnsupportedOperationException: SQL drop function
            2 UnsupportedOperationException: SQL set variable
            2 UnsupportedOperationException: approx quantile
            2 UnsupportedOperationException: collect metrics
            2 UnsupportedOperationException: decimal literal with precision or scale
            2 UnsupportedOperationException: describe
            2 UnsupportedOperationException: freq items
            2 UnsupportedOperationException: function: add_months
            2 UnsupportedOperationException: function: array_position
            2 UnsupportedOperationException: function: bit_length
            2 UnsupportedOperationException: function: bitmap_bit_position
            2 UnsupportedOperationException: function: crc32
            2 UnsupportedOperationException: function: date_add
            2 UnsupportedOperationException: function: date_sub
            2 UnsupportedOperationException: function: dayofweek
            2 UnsupportedOperationException: function: encode
            2 UnsupportedOperationException: function: format_number
            2 UnsupportedOperationException: function: from_csv
            2 UnsupportedOperationException: function: from_json
            2 UnsupportedOperationException: function: inline
            2 UnsupportedOperationException: function: least
            2 UnsupportedOperationException: function: levenshtein
            2 UnsupportedOperationException: function: make_date
            2 UnsupportedOperationException: function: map_from_arrays
            2 UnsupportedOperationException: function: map_keys
            2 UnsupportedOperationException: function: nanvl
            2 UnsupportedOperationException: function: octet_length
            2 UnsupportedOperationException: function: overlay
            2 UnsupportedOperationException: function: sec
            2 UnsupportedOperationException: function: shiftrightunsigned
            2 UnsupportedOperationException: function: slice
            2 UnsupportedOperationException: function: timestamp_seconds
            2 UnsupportedOperationException: function: xxhash64
            2 UnsupportedOperationException: handle analyze same semantics
            2 UnsupportedOperationException: tail
            2 UnsupportedOperationException: unknown function: approx_count_distinct
            2 UnsupportedOperationException: unknown function: collect_set
            2 UnsupportedOperationException: unresolved regex
            2 UnsupportedOperationException: unsupported data source format: Some("orc")
            2 UnsupportedOperationException: user defined data type should only exist in a field
            2 UnsupportedOperationException: write operation v2
            2 handle artifact statuses
            1 AnalysisException: Cannot cast string 'abc' to value of Float64 type
            1 AnalysisException: Cannot cast value 'abc' to value of Boolean type
            1 AnalysisException: Error during planning: Error during planning: Coercion from [Int64, Boolean] to t...
            1 AnalysisException: Error during planning: Error during planning: Coercion from [List(Field { name: "...
            1 AnalysisException: Error during planning: Error during planning: Coercion from [Null, Boolean] to th...
            1 AnalysisException: Error during planning: Error during planning: Coercion from [Utf8, Boolean] to th...
            1 AnalysisException: Error during planning: Error during planning: Coercion from [Utf8, Boolean] to th...
            1 AnalysisException: Error during planning: Inconsistent data type across values list at row 1 column ...
            1 AnalysisException: Error during planning: Inconsistent data type across values list at row 2 column ...
            1 AnalysisException: Error during planning: Inserting query must have the same schema with the table.
            1 AnalysisException: Error during planning: Invalid qualifier b
            1 AnalysisException: Error during planning: No function matches the given name and argument types 'NTH...
            1 AnalysisException: Error during planning: The expression to get an indexed field is only valid for `...
            1 AnalysisException: Error during planning: cannot resolve attribute: ObjectName([Identifier("df_as1.n...
            1 AnalysisException: Error during planning: cannot resolve attribute: ObjectName([Identifier("moded")]...
            1 AnalysisException: Error during planning: three values expected: [Alias(Alias { expr: Column(Column ...
            1 AnalysisException: Error during planning: three values expected: [Literal(Int32(1)), Literal(Int32(3...
            1 AnalysisException: Error during planning: zero values expected: [Literal(Int32(42))]
            1 AnalysisException: Execution error: Error parsing timestamp from '1997-02-28 10:30:00' using format ...
            1 AnalysisException: Execution error: Error parsing timestamp from '2015-04-08' using format 'yyyy-MM-...
            1 AnalysisException: Execution error: Error parsing timestamp from '2023-01-01' using format 'dd-MM-yy...
            1 AnalysisException: Execution error: The UPPER function can only accept strings, but got Int64.
            1 AnalysisException: Invalid or Unsupported Configuration: could not find config namespace for key "ig...
            1 AnalysisException: Invalid or Unsupported Configuration: could not find config namespace for key "li...
            1 AnalysisException: No field named "#3". Valid fields are "#0", "v % Int32(2)".
            1 AnalysisException: No field named "#3". Valid fields are "v % Int32(2)".
            1 AnalysisException: No field named "?table?"."#1". Valid fields are "?table?"."#0".
            1 AnalysisException: No field named tbl."#2". Valid fields are tbl."#3".
            1 AnalysisException: Schema contains duplicate unqualified field name "#0"
(+1)        1 AssertionError: "Database 'memory:ccce23be-376d-429c-91cc-8e72f387cc00' dropped." does not match "in...
(+1)        1 AssertionError: "Database 'memory:d586348e-c350-44c0-ae9b-7a1561a92b6e' dropped." does not match "in...
            1 AssertionError: "Exception thrown when converting pandas.Series" does not match "
            1 AssertionError: "Exception thrown when converting pandas.Series" does not match "expected value at l...
            1 AssertionError: "PickleException" does not match "
            1 AssertionError: "UDTF_ARROW_TYPE_CAST_ERROR" does not match "
            1 AssertionError: "[('a', 'b')]" != "{'a': 'b'}"
            1 AssertionError: "aggregate function.*argument.*aggregate function" does not match "No field named "#...
            1 AssertionError: "foobar" does not match "function: raise_error"
            1 AssertionError: "timestamp values are not equal (timestamp='1969-01-01 09:01:01+08:00': data[0][1]='...
            1 AssertionError: 1 != 2
            1 AssertionError: AnalysisException not raised by <lambda>
            1 AssertionError: BinaryType() != NullType()
            1 AssertionError: DataFrame.iloc[:, 0] (column name="struct") are different
            1 AssertionError: Exception not raised
            1 AssertionError: Exception not raised by <lambda>
            1 AssertionError: IllegalArgumentException not raised
            1 AssertionError: Items in the second set but not the first:
            1 AssertionError: Lists differ: [Row([22 chars]e(2018, 12, 31, 16, 0), aware=datetime.datetim[16 chars...
            1 AssertionError: Lists differ: [Row([49 chars] 1), internal_value=-31532339000000000), Row(i[225 char...
            1 AssertionError: Lists differ: [Row(ln(id)=-inf, ln(id)=-inf, struct(id, name)=Row(i[1232 chars]9'))]...
            1 AssertionError: Lists differ: [] != [Column(name='age', description=None, data[165 chars]lse)]
            1 AssertionError: Row(point='[1.0, 2.0]', pypoint='[3.0, 4.0]') != Row(point='(1.0, 2.0)', pypoint='[3...
            1 AssertionError: Row(res="[('personal', [('name', 'John'), ('city', 'New York')])]") != Row(res="{'pe...
            1 AssertionError: StorageLevel(False, True, True, False, 1) != StorageLevel(False, False, False, False...
            1 AssertionError: Struc[31 chars]stampNTZType(), True), StructField('val', Inte[13 chars]ue)]) != Stru...
            1 AssertionError: Struc[32 chars]e(), False), StructField('b', DoubleType(), Fa[158 chars]ue)]) != Str...
            1 AssertionError: Struc[40 chars]ue), StructField('val', ArrayType(DoubleType(), False), True)]) != St...
            1 AssertionError: YearMonthIntervalType(0, 1) != YearMonthIntervalType(0, 0)
            1 AssertionError: [1.0, 2.0] != ExamplePoint(1.0,2.0)
            1 AttributeError: 'DataFrame' object has no attribute '_ipython_key_completions_'
            1 AttributeError: 'DataFrame' object has no attribute '_joinAsOf'
            1 Error, message length too large: found 29002145 bytes, the limit is: 4194304 bytes
            1 Exception iterating requests!
(+1)        1 FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpw5gm8cju'
(+1)        1 FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpwl2nrcdo'
            1 IllegalArgumentException: 83140 is too large to store in a Decimal128 of precision 4. Max is 9999
            1 IllegalArgumentException: column types must match schema types, expected List(Field { name: "element...
            1 IllegalArgumentException: column types must match schema types, expected List(Field { name: "element...
            1 IllegalArgumentException: column types must match schema types, expected List(Field { name: "item", ...
            1 IllegalArgumentException: invalid argument: Table my_tab not found
            1 IllegalArgumentException: invalid argument: sql parser error: Expected: (, found: EOF
            1 IllegalArgumentException: invalid type: integer `5`, expected a string at line 1 column 13
            1 IndexError: single positional indexer is out-of-bounds
            1 PySparkNotImplementedError: [NOT_IMPLEMENTED] foreach() is not implemented.
            1 PySparkNotImplementedError: [NOT_IMPLEMENTED] foreachPartition() is not implemented.
            1 PySparkNotImplementedError: [NOT_IMPLEMENTED] localCheckpoint() is not implemented.
            1 PySparkNotImplementedError: [NOT_IMPLEMENTED] sparkContext() is not implemented.
            1 PySparkNotImplementedError: [NOT_IMPLEMENTED] toJSON() is not implemented.
            1 PythonException:  ArrowTypeError: ("Expected dict key of type str or bytes, got 'int'", 'Conversion ...
            1 PythonException:  AttributeError: 'NoneType' object has no attribute 'partitionId'
            1 PythonException:  TypeError: 'NoneType' object is not subscriptable
            1 SparkRuntimeException: External error: AttributeError: 'list' object has no attribute 'y'
            1 SparkRuntimeException: External error: TypeError: TypesTestsMixin.test_complex_nested_udt_in_df.<loc...
            1 SparkRuntimeException: Object Store error: Object at location /home/runner/work/sail/sail/python/tes...
            1 SparkRuntimeException: Object Store error: Object at location /home/runner/work/sail/sail/python/tes...
            1 UnsupportedOperationException: Can't create a scalar from array of type "Map(Field { name: "entries"...
            1 UnsupportedOperationException: Insert into not implemented for this table
            1 UnsupportedOperationException: Physical plan does not support logical expression SimilarTo(Like { ne...
            1 UnsupportedOperationException: Physical plan does not support logical expression SimilarTo(Like { ne...
            1 UnsupportedOperationException: Physical plan does not support logical expression SimilarTo(Like { ne...
            1 UnsupportedOperationException: Physical plan does not support logical expression Sort(Sort { expr: C...
            1 UnsupportedOperationException: SQL show functions
            1 UnsupportedOperationException: SQL show tables
            1 UnsupportedOperationException: Unsupported statement: ShowVariable { variable: [Ident { value: "DATA...
            1 UnsupportedOperationException: bucketing
            1 UnsupportedOperationException: call function
            1 UnsupportedOperationException: deduplicate within watermark
            1 UnsupportedOperationException: function: array_compact
            1 UnsupportedOperationException: function: array_insert
            1 UnsupportedOperationException: function: array_join
            1 UnsupportedOperationException: function: array_max
            1 UnsupportedOperationException: function: array_min
            1 UnsupportedOperationException: function: array_size
            1 UnsupportedOperationException: function: array_sort
            1 UnsupportedOperationException: function: arrays_overlap
            1 UnsupportedOperationException: function: arrays_zip
            1 UnsupportedOperationException: function: bin
            1 UnsupportedOperationException: function: bit_count
            1 UnsupportedOperationException: function: bit_get
            1 UnsupportedOperationException: function: bitmap_bucket_number
            1 UnsupportedOperationException: function: bround
            1 UnsupportedOperationException: function: btrim
            1 UnsupportedOperationException: function: cardinality
            1 UnsupportedOperationException: function: char
            1 UnsupportedOperationException: function: char_length
            1 UnsupportedOperationException: function: character_length
            1 UnsupportedOperationException: function: coalesce
            1 UnsupportedOperationException: function: concat
            1 UnsupportedOperationException: function: concat_ws
            1 UnsupportedOperationException: function: conv
            1 UnsupportedOperationException: function: convert_timezone
            1 UnsupportedOperationException: function: csc
            1 UnsupportedOperationException: function: current_catalog
            1 UnsupportedOperationException: function: current_database
            1 UnsupportedOperationException: function: current_schema
            1 UnsupportedOperationException: function: current_timestamp
            1 UnsupportedOperationException: function: current_timezone
            1 UnsupportedOperationException: function: date_diff
            1 UnsupportedOperationException: function: date_format
            1 UnsupportedOperationException: function: date_from_unix_date
            1 UnsupportedOperationException: function: dateadd
            1 UnsupportedOperationException: function: datediff
            1 UnsupportedOperationException: function: day
            1 UnsupportedOperationException: function: dayofmonth
            1 UnsupportedOperationException: function: dayofyear
            1 UnsupportedOperationException: function: decode
            1 UnsupportedOperationException: function: e
            1 UnsupportedOperationException: function: element_at
            1 UnsupportedOperationException: function: elt
            1 UnsupportedOperationException: function: equal_null
            1 UnsupportedOperationException: function: expm1
            1 UnsupportedOperationException: function: find_in_set
            1 UnsupportedOperationException: function: format_string
            1 UnsupportedOperationException: function: from_unixtime
            1 UnsupportedOperationException: function: from_utc_timestamp
            1 UnsupportedOperationException: function: get_json_object
            1 UnsupportedOperationException: function: getbit
            1 UnsupportedOperationException: function: greatest
            1 UnsupportedOperationException: function: hash
            1 UnsupportedOperationException: function: hex
            1 UnsupportedOperationException: function: hour
            1 UnsupportedOperationException: function: ifnull
            1 UnsupportedOperationException: function: initcap
            1 UnsupportedOperationException: function: inline_outer
            1 UnsupportedOperationException: function: instr
            1 UnsupportedOperationException: function: java_method
            1 UnsupportedOperationException: function: json_array_length
            1 UnsupportedOperationException: function: json_object_keys
            1 UnsupportedOperationException: function: json_tuple
            1 UnsupportedOperationException: function: last_day
            1 UnsupportedOperationException: function: left
            1 UnsupportedOperationException: function: locate
            1 UnsupportedOperationException: function: log1p
            1 UnsupportedOperationException: function: lpad
            1 UnsupportedOperationException: function: ltrim
            1 UnsupportedOperationException: function: make_dt_interval
            1 UnsupportedOperationException: function: make_interval
            1 UnsupportedOperationException: function: make_timestamp
            1 UnsupportedOperationException: function: make_timestamp_ltz
            1 UnsupportedOperationException: function: make_timestamp_ntz
            1 UnsupportedOperationException: function: make_ym_interval
            1 UnsupportedOperationException: function: map_concat
            1 UnsupportedOperationException: function: map_entries
            1 UnsupportedOperationException: function: map_from_entries
            1 UnsupportedOperationException: function: map_values
            1 UnsupportedOperationException: function: mask
            1 UnsupportedOperationException: function: minute
            1 UnsupportedOperationException: function: month
            1 UnsupportedOperationException: function: months_between
            1 UnsupportedOperationException: function: next_day
            1 UnsupportedOperationException: function: nullif
            1 UnsupportedOperationException: function: nvl
            1 UnsupportedOperationException: function: nvl2
            1 UnsupportedOperationException: function: parse_url
            1 UnsupportedOperationException: function: pi
            1 UnsupportedOperationException: function: pmod
            1 UnsupportedOperationException: function: position
            1 UnsupportedOperationException: function: positive
            1 UnsupportedOperationException: function: printf
            1 UnsupportedOperationException: function: quarter
            1 UnsupportedOperationException: function: reflect
            1 UnsupportedOperationException: function: regexp_count
            1 UnsupportedOperationException: function: regexp_extract
            1 UnsupportedOperationException: function: regexp_extract_all
            1 UnsupportedOperationException: function: regexp_instr
            1 UnsupportedOperationException: function: regexp_substr
            1 UnsupportedOperationException: function: repeat
            1 UnsupportedOperationException: function: replace
            1 UnsupportedOperationException: function: reverse
            1 UnsupportedOperationException: function: right
            1 UnsupportedOperationException: function: rint
            1 UnsupportedOperationException: function: rpad
            1 UnsupportedOperationException: function: rtrim
            1 UnsupportedOperationException: function: schema_of_csv
            1 UnsupportedOperationException: function: schema_of_json
            1 UnsupportedOperationException: function: second
            1 UnsupportedOperationException: function: sentences
            1 UnsupportedOperationException: function: session_window
            1 UnsupportedOperationException: function: sha
            1 UnsupportedOperationException: function: sha1
            1 UnsupportedOperationException: function: sha2
            1 UnsupportedOperationException: function: sign
            1 UnsupportedOperationException: function: size
            1 UnsupportedOperationException: function: sort_array
            1 UnsupportedOperationException: function: soundex
            1 UnsupportedOperationException: function: spark_partition_id
            1 UnsupportedOperationException: function: split
            1 UnsupportedOperationException: function: split_part
            1 UnsupportedOperationException: function: stack
            1 UnsupportedOperationException: function: str_to_map
            1 UnsupportedOperationException: function: timestamp_micros
            1 UnsupportedOperationException: function: timestamp_millis
            1 UnsupportedOperationException: function: to_char
            1 UnsupportedOperationException: function: to_csv
            1 UnsupportedOperationException: function: to_json
            1 UnsupportedOperationException: function: to_number
            1 UnsupportedOperationException: function: to_unix_timestamp
            1 UnsupportedOperationException: function: to_utc_timestamp
            1 UnsupportedOperationException: function: to_varchar
            1 UnsupportedOperationException: function: translate
            1 UnsupportedOperationException: function: trim
            1 UnsupportedOperationException: function: trunc
            1 UnsupportedOperationException: function: try_add
            1 UnsupportedOperationException: function: try_divide
            1 UnsupportedOperationException: function: try_element_at
            1 UnsupportedOperationException: function: try_multiply
            1 UnsupportedOperationException: function: try_subtract
            1 UnsupportedOperationException: function: try_to_binary
            1 UnsupportedOperationException: function: try_to_number
            1 UnsupportedOperationException: function: try_to_timestamp
            1 UnsupportedOperationException: function: typeof
            1 UnsupportedOperationException: function: unhex
            1 UnsupportedOperationException: function: unix_micros
            1 UnsupportedOperationException: function: unix_millis
            1 UnsupportedOperationException: function: unix_seconds
            1 UnsupportedOperationException: function: url_decode
            1 UnsupportedOperationException: function: url_encode
            1 UnsupportedOperationException: function: weekday
            1 UnsupportedOperationException: function: weekofyear
            1 UnsupportedOperationException: function: width_bucket
            1 UnsupportedOperationException: function: xpath
            1 UnsupportedOperationException: function: xpath_boolean
            1 UnsupportedOperationException: function: xpath_double
            1 UnsupportedOperationException: function: xpath_float
            1 UnsupportedOperationException: function: xpath_int
            1 UnsupportedOperationException: function: xpath_long
            1 UnsupportedOperationException: function: xpath_number
            1 UnsupportedOperationException: function: xpath_short
            1 UnsupportedOperationException: function: xpath_string
            1 UnsupportedOperationException: function: year
            1 UnsupportedOperationException: handle analyze semantic hash
            1 UnsupportedOperationException: list functions
            1 UnsupportedOperationException: summary
            1 UnsupportedOperationException: timestamp
            1 UnsupportedOperationException: unknown function: any_value
            1 UnsupportedOperationException: unknown function: count_if
            1 UnsupportedOperationException: unknown function: count_min_sketch
            1 UnsupportedOperationException: unknown function: distributed_sequence_id
            1 UnsupportedOperationException: unknown function: grouping_id
            1 UnsupportedOperationException: unknown function: histogram_numeric
            1 UnsupportedOperationException: unknown function: kurtosis
            1 UnsupportedOperationException: unknown function: max_by
            1 UnsupportedOperationException: unknown function: min_by
            1 UnsupportedOperationException: unknown function: mode
            1 UnsupportedOperationException: unknown function: product
            1 UnsupportedOperationException: unknown function: skewness
            1 UnsupportedOperationException: unknown function: try_avg
            1 UnsupportedOperationException: unknown function: try_sum
            1 internal error: unknown attribute in plan 203: a
(-1)        0 AssertionError: "Database 'memory:a813712e-491c-4557-9c52-ad9fd80ae56e' dropped." does not match "in...
(-1)        0 AssertionError: "Database 'memory:b0cdd9f5-10e5-4b89-8bed-821df8336eae' dropped." does not match "in...
(-1)        0 FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpafmn9urd'
(-1)        0 FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpcnt4re4b'
Passed Tests Diff
--- base	2024-08-08 11:40:41.244046228 +0000
+++ head	2024-08-08 11:40:41.356046969 +0000
@@ -62 +61,0 @@
-.venvs/test/lib/python3.11/site-packages/pyspark/sql/dataframe.py::pyspark.sql.dataframe.DataFrame.withColumns

Copy link
Contributor

@shehabgamin shehabgamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is huge, great work!! 🚀

@shehabgamin shehabgamin merged commit f4b2eb5 into main Aug 8, 2024
5 checks passed
@shehabgamin shehabgamin deleted the python-docs branch August 8, 2024 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants