Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storing unicode characters in SQL Server #215

Closed
jnolis opened this issue Jan 5, 2018 · 6 comments
Closed

Storing unicode characters in SQL Server #215

jnolis opened this issue Jan 5, 2018 · 6 comments

Comments

@jnolis
Copy link

jnolis commented Jan 5, 2018

I'm trying to write Unicode strings from R to SQL. Unfortunately, the Unicode characters only seem to work when I load the table back into R, and not when I view the table in SSMS or A different tool.

require(odbc)
require(DBI)
require(dplyr)
con <- DBI::dbConnect(odbc::odbc(),
    .connection_string = "DRIVER={ODBC Driver 13 for SQL Server};SERVER=Test; Database=Test; trusted_connection=yes;")
testData <- data_frame(Characters = "")
dbWriteTable(con,"TestUnicode",testData,overwrite=TRUE)
result <- dbReadTable(con, "TestUnicode")

Successfully yields:

> result$Characters
[1] ""

However, when I pull that table in SSMS:

SELECT * FROM TestUnicode

I get two different characters:

Characters
â¤

It looks like the column stored in SQL is a varchar rather than nvarchar, and somehow the binary value of the particular heart character ends up being different than what I get when I insert a row from SSMS. Can the unicode characters be stored in a way that they can be used universally and not just in R?

@krlmlr
Copy link
Member

krlmlr commented Jan 5, 2018

Thanks. This is likely to be a problem with the odbc package, or the underlying ODBC driver for SQL server.

@jimhester: Does this ring a bell?

@jnolis
Copy link
Author

jnolis commented Jan 9, 2018

I actually found a workaround. The issue is that when R writes to SQL, it writes the strings as UTF-8 and stores them in a VARCHAR(255) column. SQL SERVER needs the characters to be UTF-16LE encoded and stored in an NVARCHAR(255) column. Thus, you need to both:

  1. Manually create the table in SQL Server first, making sure to create an NVARCHAR column
  2. Change the column in R from being a string column to a list column of UTF-16LE raw bytes.

This seems like something that should still be handled by either DBI or ODBC or something though.

require(odbc)
require(DBI)
require(dplyr)
require(stringr)
require(purrr)

# This function takes a string vector and turns it into a list of raw UTF-16LE bytes. 
# These will be needed to load into SQL Server
convertToUTF16 <- function(s){
  map(s, function(x) iconv(x,from="UTF-8",to="UTF-16LE",toRaw=TRUE) %>% unlist)
}

#creat a connection to a sql table
connectionString <- "DRIVER={ODBC Driver 13 for SQL Server}; SERVER=[Server]; Database=Test; trusted_connection=yes;"
con <- DBI::dbConnect(odbc::odbc(),
                      .connection_string = connectionString)

# Create the SQL Table. THIS IS REALLY IMPORTANT! 
# If the dbWriteTable creates the table, then it will incorrectly 
# make the Char column a VARCHAR and not an NVARCHAR
dbExecute(con, "CREATE TABLE UnicodeExample (ID INT, Char NVARCHAR(MAX))")

# our example data
testData <- data_frame(ID = c(1,2,3), Char = c("I", "","Apples"))

# we adjust the column with the UTF-8 strings to instead 
# be a list column of UTF-16LE bytes
testData <- testData %>% mutate(Char = convertToUTF16(Char))

# write the table to the database
dbWriteTable(con, "UnicodeExample",testData,append=TRUE)

dbDisconnect(con)

@jimhester
Copy link
Contributor

FWIW you can also use the field.types parameter of odbc::dbWriteTable() to override any or all of the default types for creating the table.

dbWriteTable(con, "UnicodeExample", testData, field.types = c(Char = "NVARCHAR(MAX)"))

@jnolis
Copy link
Author

jnolis commented Jan 9, 2018

That's good to know, thanks!

@krlmlr krlmlr closed this as completed Jan 25, 2018
@iamsaini87
Copy link

@jimhester @jnolis I used above steps and able to push Japanese data from R/CSV to Microsoft sql DB successfully.
But now when i am trying to read data from DB using R i am getting text as “????”
Event tried to pass encoding = "UTF-8" in dbconnect but no luck.
Any pointers would be appreciated, thanks

  library(sqldf)
  library(odbc)
    con <- dbConnect(odbc(),
         Driver = "SQL Server",
         Server = "MSSQL2016",
         Database = "schema",
         UID = "sa",
         PWD = "mypasswprd")

      Query2 = "SELECT UserIntents
      FROM Performance_by_topic 
      GROUP BY UserIntents"

      QueryDF2 = sqldf(Query2,conn = con)   

Below is the result of QueryDF2

   UserIntents     
   ??????????????????    
   ???????????????????    
   ????????????????????    
   ????????????????? 

@philibe
Copy link

philibe commented Nov 12, 2019

Following field.types = c(Char = "NVARCHAR(MAX)") but with vector and compute of max because of the error "dbReadTable/dbGetQuery returns Invalid Descriptor Index" (r-dbi/odbc#112) :

  
vector_nvarchar<-c(Filter(Negate(is.null), 
                              (
                                lapply(testData,function(x){
                                  if (is.character(x) ) c(
                                    names(x),
                                    paste0("NVARCHAR(", 
                                           max(
                                             # nvarchar(max) gave error dbReadTable/dbGetQuery returns Invalid Descriptor Index error on SQL server 
                                             # https://github.com/r-dbi/odbc/issues/112  
                                             # so we compute the max                                           
                                             nchar(
                                               iconv( #nchar doesn't work for UTF-8 :  help (nchar)
                                                 Filter(Negate(is.null),x)
                                                 ,"UTF-8","ASCII",sub ="x" 
                                               )
                                             )
                                             ,na.rm = TRUE)
                                           ,")"
                                    )
                                  )
                                })
                              )
    ))

con= DBI::dbConnect(odbc::odbc(),.connection_string=xxxxt, encoding = 'UTF-8')

DBI::dbWriteTable(con,"UnicodeExample",testData, overwrite= TRUE, append=FALSE, field.types= vector_nvarchar)

 DBI::dbGetQuery(con,iconv('select * from UnicodeExample'))

@github-actions github-actions bot locked and limited conversation to collaborators Nov 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants