Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Groupby behaves differently depending on the order of the columns #396

Closed
igonro opened this issue Feb 21, 2022 · 10 comments · Fixed by #398
Closed

Groupby behaves differently depending on the order of the columns #396

igonro opened this issue Feb 21, 2022 · 10 comments · Fixed by #398
Assignees

Comments

@igonro
Copy link
Contributor

igonro commented Feb 21, 2022

Describe the bug
When creating a DataFrame, depending on the order of the columns the groupby() function works properly or returns an error.

To Reproduce
This column order works perfectly:

let data = {
    worker: ["david", "david", "john", "alice", "john", "david"],
    hours: [5, 6, 2, 8, 4, 3],
    day: ["monday", "tuesday", "wednesday", "thursday", "friday", "friday"],
};
let df = new dfd.DataFrame(data);

df.groupby(["day"]).col(["hours"]).sum().print()

// ╔════════════╤═══════════════════╤═══════════════════╗
// ║            │ day               │ hours_sum         ║
// ╟────────────┼───────────────────┼───────────────────╢
// ║ 0          │ monday            │ 5                 ║
// ╟────────────┼───────────────────┼───────────────────╢
// ║ 1          │ tuesday           │ 6                 ║
// ╟────────────┼───────────────────┼───────────────────╢
// ║ 2          │ wednesday         │ 2                 ║
// ╟────────────┼───────────────────┼───────────────────╢
// ║ 3          │ thursday          │ 8                 ║
// ╟────────────┼───────────────────┼───────────────────╢
// ║ 4          │ friday            │ 7                 ║
// ╚════════════╧═══════════════════╧═══════════════════╝

df.groupby(["worker"]).count().print()
// ╔════════════╤═══════════════════╤═══════════════════╤═══════════════════╗
// ║            │ worker            │ hours_count       │ day_count         ║
// ╟────────────┼───────────────────┼───────────────────┼───────────────────╢
// ║ 0          │ david             │ 3                 │ 3                 ║
// ╟────────────┼───────────────────┼───────────────────┼───────────────────╢
// ║ 1          │ john              │ 2                 │ 2                 ║
// ╟────────────┼───────────────────┼───────────────────┼───────────────────╢
// ║ 2          │ alice             │ 1                 │ 1                 ║
// ╚════════════╧═══════════════════╧═══════════════════╧═══════════════════╝

But when I change the column order to the following it doesn't work:

let data = {
    hours: [5, 6, 2, 8, 4, 3],
    worker: ["david", "david", "john", "alice", "john", "david"],
    day: ["monday", "tuesday", "wednesday", "thursday", "friday", "friday"],
};
let df = new dfd.DataFrame(data);

df.groupby(["day"]).col(["hours"]).sum().print()
// Uncaught Error: Can't perform math operation on column hours
//    arithemetic groupby.ts:266
//    operations groupby.ts:417
//    count groupby.ts:431

df.groupby(["worker"]).count().print()
// Uncaught Error: Can't perform math operation on column hours
//    arithemetic groupby.ts:266
//    operations groupby.ts:417
//    count groupby.ts:431

Expected behavior
I would expect that changing the order of the columns wouldn't make any change on the result.

Desktop (please complete the following information):

  • OS: Windows 11
  • Browser: Firefox v97.0.1, Chrome v98.0.4758.102, Edge v98.0.1108.56
  • Version: -

Additional context
I'm using the browser version, not the node.js one.

@igonro
Copy link
Contributor Author

igonro commented Feb 22, 2022

After having debugged the code a bit, I think maybe I have been able to find the error.

In groupby.ts, line 264:

let colDtype = this.colDtype[colIndex]

When stopped at this line in the example that returns the error we have:

colKey = 0 // the variable being iterated in the for loop
colName = "hours" // the name of the column in the current iteration
colIndex = 0 // the index of the column in the current iteration

this.colDtype = ["string"] // this refers to the Dtype of the groupby column ("day" or "worker" in my example)

so when we execute let colDtype = this.colDtype[colIndex] we are trying to get the dtype of the column "hours" (which is int32), but we are looking in the this.colDtype list (which only contains "string" and it refers to the column "day").

I don't understand why we need to check if the colDtype is string for the operation count, because we can also count string columns.

I will do more debug in order to fully understand the problem, but since I can't program in Typescript is a little bit hard for me.

thanks.

@risenW
Copy link
Member

risenW commented Feb 22, 2022

After having debugged the code a bit, I think maybe I have been able to find the error.

In groupby.ts, line 264:

let colDtype = this.colDtype[colIndex]

When stopped at this line in the example that returns the error we have:

colKey = 0 // the variable being iterated in the for loop
colName = "hours" // the name of the column in the current iteration
colIndex = 0 // the index of the column in the current iteration

this.colDtype = ["string"] // this refers to the Dtype of the groupby column ("day" or "worker" in my example)

so when we execute let colDtype = this.colDtype[colIndex] we are trying to get the dtype of the column "hours" (which is int32), but we are looking in the this.colDtype list (which only contains "string" and it refers to the column "day").

I don't understand why we need to check if the colDtype is string for the operation count, because we can also count string columns.

I will do more debug in order to fully understand the problem, but since I can't program in Typescript is a little bit hard for me.

thanks.

Thanks for taking out time to submit this issue. @steveoni Can you take a look.

@igonro
Copy link
Contributor Author

igonro commented Feb 22, 2022

Hi @risenW and @steveoni, thanks for your quick response!

I was trying to write some tests and contribute to the project fixing this issue, but I couldn't run the tests. If you can help me solving this error, I could try to fix this myself and open a PR, I'm trying to learn a bit of Typescript 😄

When I first tried to run all the tests with yarn test (after running yarn install), I got the following error:

Error: error:0308010C:digital envelope routines::unsupported
    at new Hash (node:internal/crypto/hash:67:19)
    at Object.createHash (node:crypto:130:10)
    at BulkUpdateDecorator.hashFactory (/home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack/lib/util/createHash.js:144:18)
    at BulkUpdateDecorator.update (/home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack/lib/util/createHash.js:46:50)
    at OriginalSource.updateHash (/home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack-sources/lib/OriginalSource.js:104:8)
    at NormalModule._initBuildHash (/home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack/lib/NormalModule.js:753:17)
    at handleParseResult (/home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack/lib/NormalModule.js:817:10)
    at /home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack/lib/NormalModule.js:908:4
    at processResult (/home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack/lib/NormalModule.js:640:11)
    at /home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack/lib/NormalModule.js:692:5 {
  opensslErrorStack: [ 'error:03000086:digital envelope routines::initialization error' ],
  library: 'digital envelope routines',
  reason: 'unsupported',
  code: 'ERR_OSSL_EVP_UNSUPPORTED'
}

Node.js v17.2.0
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Then I tried to run only the groupby.test.js file changing the package.json file, and I got the following error:

yarn run v1.22.17
$ /home/iago/workspace/danfojs/src/danfojs-browser/tests/aggregators/groupby.test.js && yarn
/home/iago/workspace/danfojs/src/danfojs-browser/tests/aggregators/groupby.test.js: 1: /bin: Permission denied
/home/iago/workspace/danfojs/src/danfojs-browser/tests/aggregators/groupby.test.js: 3: Syntax error: word unexpected (expecting ")")
error Command failed with exit code 2.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Maybe I have a wrong version installed of yarn or node? These are my versions:

$ node --version
v17.2.0
$ npm --version
8.1.4
$ yarn --version
1.22.17

@risenW
Copy link
Member

risenW commented Feb 22, 2022

Hi @risenW and @steveoni, thanks for your quick response!

I was trying to write some tests and contribute to the project fixing this issue, but I couldn't run the tests. If you can help me solving this error, I could try to fix this myself and open a PR, I'm trying to learn a bit of Typescript 😄

When I first tried to run all the tests with yarn test (after running yarn install), I got the following error:

Error: error:0308010C:digital envelope routines::unsupported
    at new Hash (node:internal/crypto/hash:67:19)
    at Object.createHash (node:crypto:130:10)
    at BulkUpdateDecorator.hashFactory (/home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack/lib/util/createHash.js:144:18)
    at BulkUpdateDecorator.update (/home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack/lib/util/createHash.js:46:50)
    at OriginalSource.updateHash (/home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack-sources/lib/OriginalSource.js:104:8)
    at NormalModule._initBuildHash (/home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack/lib/NormalModule.js:753:17)
    at handleParseResult (/home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack/lib/NormalModule.js:817:10)
    at /home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack/lib/NormalModule.js:908:4
    at processResult (/home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack/lib/NormalModule.js:640:11)
    at /home/iago/workspace/danfojs/src/danfojs-browser/node_modules/webpack/lib/NormalModule.js:692:5 {
  opensslErrorStack: [ 'error:03000086:digital envelope routines::initialization error' ],
  library: 'digital envelope routines',
  reason: 'unsupported',
  code: 'ERR_OSSL_EVP_UNSUPPORTED'
}

Node.js v17.2.0
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Then I tried to run only the groupby.test.js file changing the package.json file, and I got the following error:

yarn run v1.22.17
$ /home/iago/workspace/danfojs/src/danfojs-browser/tests/aggregators/groupby.test.js && yarn
/home/iago/workspace/danfojs/src/danfojs-browser/tests/aggregators/groupby.test.js: 1: /bin: Permission denied
/home/iago/workspace/danfojs/src/danfojs-browser/tests/aggregators/groupby.test.js: 3: Syntax error: word unexpected (expecting ")")
error Command failed with exit code 2.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Maybe I have a wrong version installed of yarn or node? These are my versions:

$ node --version
v17.2.0
$ npm --version
8.1.4
$ yarn --version
1.22.17

Can you install and work with node v14. You can use nvm to install a specific version of Nodejs and use it.

@igonro
Copy link
Contributor Author

igonro commented Feb 22, 2022

Thanks @risenW, I will try it!

@igonro
Copy link
Contributor Author

igonro commented Feb 22, 2022

Now I'm having a problem with ChromeHeadless in WSL2.

No binary for ChromeHeadless browser on your platform

I saw that @sponsfreixes had a similar issue in #173, I will try to fix it myself 😅

@sponsfreixes
Copy link

@igonro I can't remember how I fixed the issue with headless chrome and WSL... I wonder if I ended ditching WSL and making it work on Windows, or even just running it on some other computer running a native Linux. Sorry for not being able to help on that!

@igonro
Copy link
Contributor Author

igonro commented Feb 23, 2022

@sponsfreixes thanks for your response, and don't worry I was able to solve it and run the tests! 😄

@steveoni
Copy link
Member

@igonro great catch. this was as a result of code change from the previous version

Here is how to fix the error base on column wise operation you made mention of. this line should be replace

const colDtype = this.dtypes.filter((val, index) => {

to this

const colDtype = this.dtypes

Then to enable arithmetic operation for count, which is ideal for non-numerical columns, change this

if (colDtype === "string") throw new Error(`Can't perform math operation on column ${colName}`)

to this

 if (typeof operation === "string") {
      if (colDtype === "string" && operation !=="count") throw new Error(`Can't perform math operation on column ${colName}`)
}
 else {
      if (colDtype === "string" && operation[colName] !=="count") throw new Error(`Can't perform math operation on column ${colName}`)
 }

@igonro
Copy link
Contributor Author

igonro commented Feb 23, 2022

Hi @steveoni, thanks for the help! I was having some troubles finding that bug in frame.ts.

I've just opened a PR in #398. Feel free to modify or add corrections to adjust the code to your liking 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants