Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move DataFrame to machinelearning #5641

Merged
merged 54 commits into from
Mar 11, 2021
Merged

Move DataFrame to machinelearning #5641

merged 54 commits into from
Mar 11, 2021

Commits on Nov 6, 2019

  1. Change namespace to Microsoft.Data.Analysis (dotnet#2773)

    * Update namespace to Microsoft.Data.Analysis
    
    * Remove "DataFrame" from the test project name
    Prashanth Govindarajan authored Nov 6, 2019
    Configuration menu
    Copy the full SHA
    df1661b View commit details
    Browse the repository at this point in the history

Commits on Nov 7, 2019

  1. APIs for reversed binary operators (dotnet#2769)

    * Support reverse binary operators
    
    * Fix file left behind in a rebase
    
    * Fix whitespace
    Prashanth Govindarajan authored Nov 7, 2019
    Configuration menu
    Copy the full SHA
    f09c30c View commit details
    Browse the repository at this point in the history

Commits on Nov 8, 2019

  1. Throw for incompatible inPlace (dotnet#2778)

    * Throw if inPlace is set and types mismatch
    
    * Unit test
    
    * Better error message
    
    * Remove empty lines
    Prashanth Govindarajan authored Nov 8, 2019
    Configuration menu
    Copy the full SHA
    afe3e61 View commit details
    Browse the repository at this point in the history
  2. Version, Tags and Description for Nuget (dotnet#2779)

    * Version, Tags and Description for Nuget
    
    * sq
    Prashanth Govindarajan authored Nov 8, 2019
    Configuration menu
    Copy the full SHA
    5c4b1b4 View commit details
    Browse the repository at this point in the history

Commits on Nov 9, 2019

  1. Flags for release (dotnet#2781)

    * Publish packages to artifacts
    
    * Flags for release
    Prashanth Govindarajan authored Nov 9, 2019
    Configuration menu
    Copy the full SHA
    5de343c View commit details
    Browse the repository at this point in the history

Commits on Nov 22, 2019

  1. Fix the Description method to not throw (dotnet#2786)

    * Fix the Description method to not crash
    Adds an Info method
    
    * sq
    
    * Address feddback
    
    * Last round of feedback
    Prashanth Govindarajan authored Nov 22, 2019
    Configuration menu
    Copy the full SHA
    81f3d42 View commit details
    Browse the repository at this point in the history

Commits on Dec 3, 2019

  1. Use dataTypes if it passed in to LoadCsv (dotnet#2791)

    * Fix LoadCsv to use dataType if it passed in
    
    * sq
    
    * Don't read the full file after guessRows lines have been read
    
    * Address feedback
    
    * Last round of feedback
    Prashanth Govindarajan authored Dec 3, 2019
    Configuration menu
    Copy the full SHA
    c6eb2f7 View commit details
    Browse the repository at this point in the history

Commits on Dec 5, 2019

  1. Creating a Rows property, similar to Columns (dotnet#2794)

    * Rows collection, similar to Columns
    
    * Doc
    
    * Some minor clean up
    
    * Make DataFrameRow a view into the DataFrame
    
    * sq
    
    * Address feedback
    
    * Remove DataFrame.RowCount
    
    * More row count changes
    
    * sq
    
    * Address feedback
    
    * Merge upstream
    Prashanth Govindarajan authored Dec 5, 2019
    Configuration menu
    Copy the full SHA
    7cee9d9 View commit details
    Browse the repository at this point in the history
  2. DataFrame.LoadCsv throws an exception on projects targeting < netcore…

    …3.0 (dotnet#2797)
    
    Fixing by passing in an encoding and a default buffer size.
    
    Also, get our tests running on .NET Framework.
    
    Fix dotnet#2783
    eerhardt authored Dec 5, 2019
    Configuration menu
    Copy the full SHA
    e64cbad View commit details
    Browse the repository at this point in the history

Commits on Dec 6, 2019

  1. Params constructor on DataFrame (dotnet#2800)

    * Params constructor on DataFrame
    
    * Delete redundant constructors
    Prashanth Govindarajan authored Dec 6, 2019
    Configuration menu
    Copy the full SHA
    303ba62 View commit details
    Browse the repository at this point in the history

Commits on Dec 12, 2019

  1. Remove T : unmanaged constraint from DataFrameColumn.BinaryOperatio…

    …ns (dotnet#2801)
    
    * Remove T : unmanaged constraint from DataFrameColumn.BinaryOperations
    
    * Address feedback
    
    * Rename the value version of the APIs
    
    * sq
    
    * Fix build
    
    * Address feedback
    
    * Remove Value from the APIs
    
    * sq
    
    * Address feedback
    Prashanth Govindarajan authored Dec 12, 2019
    Configuration menu
    Copy the full SHA
    dc4f9b0 View commit details
    Browse the repository at this point in the history
  2. Bump version to 0.2.0 (dotnet#2803)

    Prashanth Govindarajan authored Dec 12, 2019
    Configuration menu
    Copy the full SHA
    838350b View commit details
    Browse the repository at this point in the history

Commits on Jan 13, 2020

  1. Add Apply<TResult>method to PrimitiveDataFrameColumn (dotnet#2807)

    * Add Apply method to PrimitiveDataFrameColumn and its container
    
    * Add TestApply test
    
    * Remove unused df variable in DataFrameTests
    
    * Add xml doc comments to Apply method
    zHaytam authored and Prashanth Govindarajan committed Jan 13, 2020
    Configuration menu
    Copy the full SHA
    0fa210d View commit details
    Browse the repository at this point in the history

Commits on Jan 16, 2020

  1. Add additional tests for ReadCsv (dotnet#2811)

    * Add additional tests for ReadCsv
    
    * Update asserts
    
    * Add empty row and skip test pending another fix
    
    * Remove test for another issue
    jwood803 authored and Prashanth Govindarajan committed Jan 16, 2020
    Configuration menu
    Copy the full SHA
    430ac09 View commit details
    Browse the repository at this point in the history

Commits on Jan 21, 2020

  1. Added static factory methods to DataFrameColumn (dotnet#2808)

    * Added static factory methods to DataFrameColumn where they make sense (for the overloads where its possible to infer the column's type).
    
    * Remove regions
    
    * Update some parts of the unit tests to use static factory methods to create DataFrameColumns.
    
    * Remove errant {T} on StringDataFrameColumn.
    
    * PR feedback
    
    Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
    2 people authored and Prashanth Govindarajan committed Jan 21, 2020
    Configuration menu
    Copy the full SHA
    70bb9e9 View commit details
    Browse the repository at this point in the history

Commits on Jan 28, 2020

  1. Append rows to a DataFrame (dotnet#2823)

    * Append rows to a DataFrame
    
    * Unit test
    
    * Update unit tests and doc
    
    * Need to perfrom a type check every time
    
    * sq
    
    * Update unit test
    
    * Address comments
    Prashanth Govindarajan authored and msftbot[bot] committed Jan 28, 2020
    Configuration menu
    Copy the full SHA
    82c315f View commit details
    Browse the repository at this point in the history

Commits on Feb 7, 2020

  1. Move corefxlab to arcade (dotnet#2795)

    * Add eng folder
    
    * First cut of moving corefxlab to arcade
    
    * Move arcade symbol validation inside official buil
    
    * Move base yml file to root
    
    * Arcade will build, publish packages and symbols
    
    * UpdateXlf. Review this
    
    * Arcade Update to version 5.0.0-beta.19575.4 to include Experimental Channel
    
    * Remove property that was causing the build to fail
    
    * Moving global properties to the main Yaml instead of step in order to unblock publishing
    
    * Committing xlfs and changing the build script to not update Xlf on build
    
    * clean up corefxlab-base.yml
    
    * sq
    
    * Delete unused files and scripts
    
    * Get rid of all the xlf stuff
    
    * Remove UpdateXlfOnBuild for non-NT builds
    
    * Minor cleanup
    
    * More cleanup
    
    * update eng\build.sh permission
    
    * Rename to Nuget.config
    
    * sq
    
    * Remove the runtime spec from global.json
    
    * Don't publish test projs
    
    * Typo
    
    * Move version prefix to versions.props
    Change prereleaselabel to alpha
    
    * Increment version number to list as the latest package
    Increment version number of Microsoft.Experimental.Collections to list as the latest package
    Turn off graph generation
    
    * Update the Readme
    
    * Test removing the scripts folder
    
    * Touch readme to force a change
    
    * Address Jose's comments
    
    * Typo
    
    * Move versions to eng/versions.props
    
    * Benchmark.proj needs to refer to xunit
    
    * Clean up dependencies.props
    
    * Remove dependencies.props
    
    Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>
    Prashanth Govindarajan and joperezr authored Feb 7, 2020
    Configuration menu
    Copy the full SHA
    afdbc5b View commit details
    Browse the repository at this point in the history

Commits on Feb 20, 2020

  1. Rename Sort to OrderBy (dotnet#2814)

    * Rename sort to orderby and add orderbydescending method
    
    * Add doc strings
    
    * Update bench mark test
    
    * Update tests
    
    * Update DataFrameColumn to use orderby
    
    * Update doc comment
    
    * Additions to sortby
    
    * Revert "Additions to sortby"
    
    This reverts commit 3931d4e2a72ce44a539be7c27b2592395f3efd35.
    
    * Revert "Update doc comment"
    
    This reverts commit 192f7797fe2b77625486637badf77046162fedbf.
    
    * Revert "Update DataFrameColumn to use orderby"
    
    This reverts commit 8f94664c5fd18570cd2b601535e816ca5dd5e3c4.
    jwood803 authored Feb 20, 2020
    Configuration menu
    Copy the full SHA
    355d3fb View commit details
    Browse the repository at this point in the history

Commits on Mar 4, 2020

  1. Explode column types and generate converters (dotnet#2857)

    * Explode column types and generate converters
    
    * Clean this
    
    * sq
    
    * sq
    
    * Cherry pick for next commit
    
    * sq
    
    * Undo unnecessary change
    Prashanth Govindarajan authored Mar 4, 2020
    Configuration menu
    Copy the full SHA
    9e10004 View commit details
    Browse the repository at this point in the history

Commits on Mar 9, 2020

  1. Address remaining concerns from the 2nd DataFrame API Review (dotnet#…

    …2861)
    
    * Move string indexer to Columns
    
    * API changes from the 2nd API review
    
    * Unit tests
    
    * Address comments
    Prashanth Govindarajan authored Mar 9, 2020
    Configuration menu
    Copy the full SHA
    1544c23 View commit details
    Browse the repository at this point in the history

Commits on Mar 19, 2020

  1. Add binary operations and operators on the exploded columns (dotnet#2867

    )
    
    * Generate combinations of binary operations and Add
    
    * Numeric Converters and CloneAsNumericColumns
    
    * Binary, Comparison and Shift operations
    
    * Clean up and bug fix
    
    * Fix the binary op apis to not be overridden
    
    * Internal constructors for exploded types
    
    * Proper return types for exploded types
    
    * Update unit tests
    
    * Update csproj
    
    * Revert "Fix the binary op apis to not be overridden"
    
    This reverts commit 2dc2240c9449930139c1492d1388d5e1f8ba5fa1.
    
    * Bug fix and unit test
    
    * Constructor that takes in a container
    
    * Unit tests
    
    * Call the implementation where possible
    
    * Review sq
    
    * sq
    
    * Cherry pick for next commit
    
    * sq
    
    * Undo unnecessary change
    
    * Rename to the system namespace column types
    
    * Address comments
    
    * Push to pull locally
    
    * Mimic C#'s arithmetic grammar in DataFrame
    
    * Address feedback
    
    * Reduce the number of partial column definitions
    
    * Address feedback
    Prashanth Govindarajan authored Mar 19, 2020
    Configuration menu
    Copy the full SHA
    8d7fb66 View commit details
    Browse the repository at this point in the history

Commits on Mar 20, 2020

  1. Add APIs to get the strongly typed columns from a DataFrame (dotnet#2878

    )
    
    * CP
    
    * sq
    
    * sq
    
    * Improve docs
    Prashanth Govindarajan authored Mar 20, 2020
    Configuration menu
    Copy the full SHA
    7ef10ba View commit details
    Browse the repository at this point in the history

Commits on Mar 21, 2020

  1. Enable xml docs for Data.Analysis (dotnet#2882)

    * Enable xml docs for Data.Analysis
    
    * Fix /// summary around inheritdoc
    
    * Minor doc changes
    
    * sq
    
    * sq
    
    * Address feedback
    Prashanth Govindarajan authored Mar 21, 2020
    Configuration menu
    Copy the full SHA
    4072f96 View commit details
    Browse the repository at this point in the history

Commits on Mar 23, 2020

  1. Add Apply to ArrowStringDataFrameColumn (dotnet#2889)

    Prashanth Govindarajan authored Mar 23, 2020
    Configuration menu
    Copy the full SHA
    a6c34d0 View commit details
    Browse the repository at this point in the history
  2. Support for Exploded columns types in Arrow and IO scenarios (dotnet#…

    …2885)
    
    * Support for Exploded columns types in Arrow and IO scenarios
    
    * Unit tests
    
    * Address feedback
    Prashanth Govindarajan authored Mar 23, 2020
    Configuration menu
    Copy the full SHA
    9c80608 View commit details
    Browse the repository at this point in the history

Commits on Mar 24, 2020

  1. Bump version (dotnet#2890)

    Prashanth Govindarajan authored Mar 24, 2020
    Configuration menu
    Copy the full SHA
    d120982 View commit details
    Browse the repository at this point in the history
  2. Fix versioning to allow for individual stable packages (dotnet#2891)

    * Fix versioning to allow for individual stable packages
    
    * sq
    Prashanth Govindarajan authored Mar 24, 2020
    Configuration menu
    Copy the full SHA
    59df417 View commit details
    Browse the repository at this point in the history

Commits on Mar 25, 2020

  1. Bump Microsoft.Data.Analysis version to 0.4.0 (dotnet#2892)

    * Bump Microsoft.Data.Analysis version to 0.4.0
    eerhardt authored Mar 25, 2020
    Configuration menu
    Copy the full SHA
    d79dd2f View commit details
    Browse the repository at this point in the history

Commits on Apr 30, 2020

  1. Fix dotnet/corefxlab#2906 (dotnet#2907)

    * Fix dotnet/corefxlab#2906
    
    * Improvements and unit tests
    
    * sq
    
    * Better fix
    
    * sq
    Prashanth Govindarajan authored Apr 30, 2020
    Configuration menu
    Copy the full SHA
    28140bd View commit details
    Browse the repository at this point in the history

Commits on May 19, 2020

  1. Improve LoadCsv to handle null values when deducing the column types (d…

    …otnet#2916)
    
    * Unit test to repro
    
    * Fix dotnet/corefxlab#2915
    
    Append a null value to a column when encountering it instead of changing the column type to a StringDataFrameColumn
    
    * Update src/Microsoft.Data.Analysis/DataFrame.IO.cs
    
    Co-authored-by: Günther Foidl <gue@korporal.at>
    
    * Update src/Microsoft.Data.Analysis/DataFrame.cs
    
    Co-authored-by: Günther Foidl <gue@korporal.at>
    
    * Feedback
    
    Co-authored-by: Günther Foidl <gue@korporal.at>
    Prashanth Govindarajan and gfoidl authored May 19, 2020
    Configuration menu
    Copy the full SHA
    5c3ac8b View commit details
    Browse the repository at this point in the history

Commits on May 20, 2020

  1. Create a 0.4.0 package (dotnet#2918)

    Prashanth Govindarajan authored May 20, 2020
    Configuration menu
    Copy the full SHA
    0bef531 View commit details
    Browse the repository at this point in the history
  2. Revert "Create a 0.4.0 package (dotnet#2918)" (dotnet#2919)

    This reverts commit 0bef531.
    Prashanth Govindarajan authored May 20, 2020
    Configuration menu
    Copy the full SHA
    b215eb4 View commit details
    Browse the repository at this point in the history
  3. Produce a 0.4.0 build (dotnet#2920)

    Prashanth Govindarajan authored May 20, 2020
    Configuration menu
    Copy the full SHA
    3b4aafa View commit details
    Browse the repository at this point in the history

Commits on May 25, 2020

  1. Configuration menu
    Copy the full SHA
    8d08434 View commit details
    Browse the repository at this point in the history

Commits on May 26, 2020

  1. Increment version and stop producing stable packages (dotnet#2922)

    * Increment version and stop producing stable packages
    Prashanth Govindarajan authored May 26, 2020
    Configuration menu
    Copy the full SHA
    7dcf184 View commit details
    Browse the repository at this point in the history

Commits on Jun 11, 2020

  1. Add DataFrame object formatter. (dotnet#2931)

    * Add DataFrame object formatter.
    
    * Update nuget dependencies.
    
    * Apply CR fixes.
    dcostea authored Jun 11, 2020
    Configuration menu
    Copy the full SHA
    881886b View commit details
    Browse the repository at this point in the history

Commits on Jun 24, 2020

  1. Fix a bug in InsertColumn

    RamonWill authored Jun 24, 2020
    Configuration menu
    Copy the full SHA
    6e60307 View commit details
    Browse the repository at this point in the history

Commits on Jul 27, 2020

  1. Add Microsoft.Data.Analysis.nuget project (dotnet#2933)

    * Add DataFrame object formatter.
    
    * Update nuget dependencies.
    
    * Apply CR fixes.
    
    * Remove ReferenceOutputAssembly added to from Microsoft.Data.Analysys.csproj.
    
    * Add Microsoft.Data.Analysis.nuget project.
    
    * Move project to src. Fix nuget project settings.
    
    * Remove NoBuild property from project.
    
    * Remove IncludeBuildOutput and IncludeSymbols from project.
    
    * Add VersionPrefix to project.
    
    * Add IncludeBuildOutput property.
    
    * Add unit tests.
    
    * Downgrade from netcoreapp3.1 to netcoreapp3.0
    
    * Upgrade from netcoreapp3.0 to netcoreapp3.1 (dotnet interactive is not compatible with 3.0)
    
    * Add netcoreapp3.1 to global settings
    
    * Add dotnet 3.1.5 runtime to global settings
    
    * Build fixes
    
    * Moving MDAI into interactive-extensions folder of the package
    
    * Minor refactoring
    
    * Respond to PR feedback
    
    Co-authored-by: Prashanth Govindarajan <prgovi@microsoft.com>
    Co-authored-by: Jose Perez Rodriguez <joperezr@microsoft.com>
    Co-authored-by: Eric Erhardt <eric.erhardt@microsoft.com>
    4 people authored Jul 27, 2020
    Configuration menu
    Copy the full SHA
    6c2d800 View commit details
    Browse the repository at this point in the history

Commits on Sep 2, 2020

  1. ColumnName indexer on DataFrame (dotnet#2959)

    * ColumnName indexer on DataFrame
    
    Fixes dotnet/corefxlab#2934
    
    * Unit tests
    
    * Null column name
    Prashanth Govindarajan authored Sep 2, 2020
    Configuration menu
    Copy the full SHA
    7ebe8bc View commit details
    Browse the repository at this point in the history
  2. Implement FillNulls() for ArrowStringDataFrameColumn with inPlace: fa…

    …lse (dotnet#2956)
    
    * implement FillNulls method for ArrowStringDataFrameColumn
    
    * additional asserts for testcase
    RamonWill authored Sep 2, 2020
    Configuration menu
    Copy the full SHA
    54633a2 View commit details
    Browse the repository at this point in the history

Commits on Sep 10, 2020

  1. Prevent DataFrame.Sample() method from returning duplicated rows (dot…

    …net#2939)
    
    * resolves dotnet#2806
    
    * replace forloop with ArraySegment<T>
    
    * reduce shuffle loop operations from O(Rows.Count) to O(numberOfRows)
    RamonWill authored Sep 10, 2020
    Configuration menu
    Copy the full SHA
    4e6d801 View commit details
    Browse the repository at this point in the history

Commits on Oct 1, 2020

  1. Add WriteCsv plus unit tests. (dotnet#2947)

    * Add WriteCsv plus unit tests.
    
    * Add CultureInfo to WriteCsv. Remove index column param. Update unit tests.
    
    * Add CR changes. CultureInfo. Separator.
    
    * Format decimal types individually. Fix culture info. Fix unit tests.
    
    * Format decimal types individually. Fix culture info. Fix unit tests.
    dcostea authored Oct 1, 2020
    Configuration menu
    Copy the full SHA
    81d0ba5 View commit details
    Browse the repository at this point in the history

Commits on Oct 20, 2020

  1. Missing values default to a StringDataFrameColumn (dotnet#2982)

    * Make LoadCsv more robust
    
    * Test empty string column
    
    * Retain prev guess where possible
    Prashanth Govindarajan authored Oct 20, 2020
    Configuration menu
    Copy the full SHA
    db5c49e View commit details
    Browse the repository at this point in the history

Commits on Oct 23, 2020

  1. Update FromArrowRecordBatches for dotnet-spark (dotnet#2978)

    * Support for RecordBatches with StructArrays
    
    * Sq
    
    * Address comments
    
    * Nits
    
    * Nits
    Prashanth Govindarajan authored Oct 23, 2020
    Configuration menu
    Copy the full SHA
    cb7ab00 View commit details
    Browse the repository at this point in the history

Commits on Oct 30, 2020

  1. Implement DataFrame.LoadCsvFromString (dotnet#2988)

    * Implement DataFrame.LoadCsvFromString
    
    * Address comments
    Prashanth Govindarajan authored Oct 30, 2020
    Configuration menu
    Copy the full SHA
    cff30e3 View commit details
    Browse the repository at this point in the history

Commits on Dec 3, 2020

  1. Part 1 of porting the csv reader (dotnet#2997)

    Prashanth Govindarajan authored Dec 3, 2020
    Configuration menu
    Copy the full SHA
    e7a9c42 View commit details
    Browse the repository at this point in the history

Commits on Mar 3, 2021

  1. Merge branch 'port' of ../corefxlab into DataFrame_1

    Prashanth Govindarajan committed Mar 3, 2021
    Configuration menu
    Copy the full SHA
    881c619 View commit details
    Browse the repository at this point in the history
  2. Move to the test folder

    Prashanth Govindarajan committed Mar 3, 2021
    Configuration menu
    Copy the full SHA
    1c1c3a8 View commit details
    Browse the repository at this point in the history
  3. Suppress warnings

    Prashanth Govindarajan committed Mar 3, 2021
    Configuration menu
    Copy the full SHA
    fea6bd2 View commit details
    Browse the repository at this point in the history
  4. Move extensions reference out of props

    Make MDA.test use the props defined TFM
    Comment out 2 unit tests
    Prashanth Govindarajan committed Mar 3, 2021
    Configuration menu
    Copy the full SHA
    0b8541a View commit details
    Browse the repository at this point in the history

Commits on Mar 5, 2021

  1. Address feedback

    Prashanth Govindarajan committed Mar 5, 2021
    Configuration menu
    Copy the full SHA
    bf82179 View commit details
    Browse the repository at this point in the history

Commits on Mar 8, 2021

  1. Address feedback

    Prashanth Govindarajan committed Mar 8, 2021
    Configuration menu
    Copy the full SHA
    9d74a83 View commit details
    Browse the repository at this point in the history
  2. Default to preview version

    Prashanth Govindarajan committed Mar 8, 2021
    Configuration menu
    Copy the full SHA
    fa39b74 View commit details
    Browse the repository at this point in the history

Commits on Mar 10, 2021

  1. Update nuget.config

    Prashanth Govindarajan committed Mar 10, 2021
    Configuration menu
    Copy the full SHA
    216554a View commit details
    Browse the repository at this point in the history