Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas #1

Open
11 tasks
ankane opened this issue Jun 8, 2018 · 18 comments
Open
11 tasks

Ideas #1

ankane opened this issue Jun 8, 2018 · 18 comments

Comments

@ankane
Copy link
Owner

ankane commented Jun 8, 2018

Please create a new issue to discuss any ideas or share your own.

0.4.0 release

  • Make adapter first param of dbxConnect (to match dbConnect)
  • Prefer RMariaDB over RMySQL when adapter is mysql
  • Raise error when bigint value out of numeric range - bigint3 or bigint4 branch

Ideas

On hold

  • More efficient update query - use temp table (alternative approach for Postgres is update ... from link, but this requires typecasting)
  • Add retries option to dbxSelect - undo revert of dd9d716651f9321afe74def1fac491de4a4daca0 (users can use a separate package like retry for this)
  • Add statement_timeout option to dbxSelect and maybe other commands - probably not common
  • Make conn optional (use global connection if not passed)
  • Add a way to easily create tables - dbxCreateTable(db, table, records, types=list(id="bigserial primary key", score="float")) (only first 3 args required, types can be used to override defaults)
  • Add cache option to cache select statements to disk - cache branch (need ability to cache for duration and force refresh)
  • Run tests against BigQuery
@jangorecki
Copy link

there could be dbx_logging option, similar to dbx_verbose but logs would goes into db table instead of stderr (as in case of message).

@jangorecki
Copy link

jangorecki commented Jun 20, 2018

if possible to handle those two functionalities in base R it would make dbx package lightweight.

importFrom(urltools,get_credentials)
importFrom(urltools,url_parse)

@jangorecki
Copy link

jangorecki commented Jun 20, 2018

Examples of using RPostgreSQL would be nice, it is much lighter than RPostgres.
Not sure how it is currently but few years back I could not achieve few functionalities in RPostgres I was using in RPostgreSQL. There was some heavy dev to RPostgres in late 2017 so some of those could have been addressed by now. Anyway RPostgreSQL is battle tested 10 years old package so definitely should not be skipped.
Not sure, but it may help, I have a package on RPostgreSQL doing upsert, etc: https://github.com/jangorecki/pg

@ankane
Copy link
Owner Author

ankane commented Jun 20, 2018

Hey @jangorecki, thanks for the ideas. I added support for RPostgreSQL. I agree with keeping the package lightweight and minimizing dependencies, but don't want to reinvent/maintain a URL parser. As for logging, you can now pass a function to dbx_verbose and do what you'd like with the logs.

writeLogs <- function(sql) {
  # your code
}

options(dbx_verbose=writeLogs)

@ankane
Copy link
Owner Author

ankane commented Jul 4, 2018

Quick follow-up: Made urltools optional since it's only needed if you use the url functionality. Didn't realize that it had two dependencies itself.

@artemklevtsov
Copy link

artemklevtsov commented Jan 5, 2019

It could be nice to add wrapper to load a CSV/TSV files. dbxCopy or dbxLoad for example.

@ankane
Copy link
Owner Author

ankane commented Mar 8, 2019

Hey @artemklevtsov, thanks for the suggestion. Since it only takes 1 line to convert a CSV/TSV into a data frame, I'm not sure a separate function is needed.

@artemklevtsov
Copy link

I mean this function should use the effective COPY INTO instead INSERT.

@stefanfritsch
Copy link

What would be nice is a way to add missing columns when trying to update, insert or upsert.

E.g. I'm getting data from a web service where new columns can crop up now and then, e.g. "label2" (this example is taken from the gitlab API). As far as I know there's currently no easy way to emulate data.table::rbind(fill = TRUE), i.e. to first do an ALTER TABLE to add the missing columns and then use UPDATE to push the data to the database.

@ankane
Copy link
Owner Author

ankane commented Nov 3, 2019

@stefanfritsch thanks for the suggestion. This use case doesn't seem very common, so I think it's best accomplished outside of dbx.

@csatzky
Copy link

csatzky commented Mar 11, 2020

Hi Ankane!
I think a nice feature would be to have something like dbxRead() in which it is possible to provide a 'where' data frame, 'return_cols' (default: 'ALL') and a batch_size argument. E.g. I have tables with primary keys consisting of two or more columns. Hence dbxSelect with the 'params' argument does not work, for it can only handle only one column / a vector? Also the batch_size argument would be helpful for large tables. Cheers!

@ankane
Copy link
Owner Author

ankane commented Mar 17, 2020

Hey @Budfox123, thanks for the suggestions. I've considered adding a more ORM-type function for constructing queries (instead of writing SQL manually), but want to keep things simple for now. You can probably write a function that wraps dbxSelect to do this. If you do, please share.

@psimm
Copy link

psimm commented Sep 17, 2020

Hi @ankane, would it be possible to let dbxUpdate and dbxUpsert return the number of changed (or sum of changed and added) rows?

@ankane
Copy link
Owner Author

ankane commented Sep 28, 2020

Hey @psimm, both run dbExecute behinds the scenes (unless returning is passed to upsert), so it could probably sum them. Can you explain the use case a bit more?

@psimm
Copy link

psimm commented Sep 28, 2020

Hi @ankane thanks for your reply! Sorry, I think my previous suggestion of summing them was not a good idea. Here is my use case: I am downloading data from an API on a schedule. The API sometimes returns values that are already in the DB, so I use dbxUpsert. I am keeping logs of all interactions with the API. Ideally, dbxUpsert would let me know how many are new and how many are updated.

@ankane
Copy link
Owner Author

ankane commented Sep 28, 2020

If you're on Postgres and use the version from GitHub, you can do:

dbxUpsert(db, table, records, where_cols=c("id"), returning=DBI::SQL("id, (xmax = 0) AS inserted"))

I'm not sure if it's possible with the other databases.

@psimm
Copy link

psimm commented Sep 28, 2020

Awesome! Yes, I am using Postgres. Thank you for adding that.

@SurajitShasmal
Copy link

Error: Failed to fetch row: ERROR: ON CONFLICT DO UPDATE command cannot affect row a second time
HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.

Repository owner locked and limited conversation to collaborators Dec 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants