Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No slug-representation of emojis/pictograms #49

Open
remnestal opened this issue Jan 28, 2020 · 3 comments
Open

No slug-representation of emojis/pictograms #49

remnestal opened this issue Jan 28, 2020 · 3 comments

Comments

@remnestal
Copy link

remnestal commented Jan 28, 2020

This seems like a silly use case at first, but when creating a slug from a string containing emojis or pictograms, there is no representation of those characters. For example:

slug.Make("πŸ›")
slug.Make("☺")
slug.Make("π•—π•’π•Ÿπ•”π•ͺ π•₯𝕖𝕩π•₯")

all yield empty strings.

I'm not sure how such a character would best be represented in a slug, but simply removing them could be problematic in some cases. Is this intentional?

@remnestal
Copy link
Author

@dalu Let's say I have a blog platform where I let my customers set the title of their posts. I want the title of their posts to be turned into a slug for the URL. For example, let's say there's a post titled "No slug-representation of emojis/pictograms", then I expect the URL to look something like example.com/posts/no-slug-representation-of-emojis-and-pictograms. No problem.

But let's then say that a user has created two posts, whose titles contain more than just the "standard" ascii characters:

  • "𝕋𝕙𝕖𝕀𝕖 𝕔𝕙𝕒𝕣𝕒𝕔π•₯𝕖𝕣𝕀 𝕒𝕣𝕖 𝕣𝕖𝕒𝕝𝕝π•ͺ 𝕑𝕠𝕑𝕦𝕝𝕒𝕣 π• π•Ÿ π•šπ•Ÿπ•€π•₯π•’π•˜π•£π•’π•ž", and
  • "π•³π–”π–œ 𝖙𝖔 π–ˆπ–—π–Šπ–†π–™π–Š 𝖆 π–π–†π–—π–‰π–ˆπ–”π–—π–Š π–‰π–Šπ–†π–™π– π–’π–Šπ–™π–†π–‘ π–‡π–‘π–”π–Œ π–•π–”π–˜π–™ π–™π–Žπ–™π–‘π–Š"

Then both those blog posts would have the slug "", which is problematic. Don't focus too much on the πŸ›-emoji in my previous example, there's lots of unicode not covered by this package that can make URLs collide.

I realize that there's is no obvious solution to this problem, in fact I said so in the last sentence of my original post, but forcing every platform to implement a huge custom substitution map for all of these characters is hardly a satisfying solution

@alex-dodich
Copy link

Facing same problem and its looks like there are no solution to make slug from any forbidden symbol :(
For this case you can do something like this:

func createSlug(title string) string {
        // generate non empty slug
	pSlug := slug.Make(title)
	if pSlug == "" {
		pSlug = "untitled"
	}

        // add "random" part to keep slug unique
	return fmt.Sprintf("%s-%d", slug.Make(title), time.Now().Nanosecond()/1000)
}

@matrixik
Copy link
Member

Thank you for this report and sorry it took so long, burnout is not nice...

So first: https://github.com/rainycape/unidecode that slug package is using underneath have test showing that it is stripping emojis:

https://github.com/rainycape/unidecode/blob/cb7f23ec59bec0d61b19c56cd88cee3d0cc1870c/unidecode_test.go#L30-L33

I forked it to https://github.com/gosimple/unidecode
It's true that it's missing a lot of characters that could be properly converted to ASCII and everyone are welcome to provide more updates (I'll also merge at some point additions from forks, like https://github.com/cuilun/unidecode).

Second: from the beginning I designed slug to be on the safe site and, for example, I also used it for generating file names so chars like / should not be in the output.

Third: I will not change default behavior (I don't want to break anyone) but it's possible to add some flag like AllPrintableASCII by default set to false (to allow all chars from https://en.wikipedia.org/wiki/ASCII#Printable_characters - but space will be still replaced with -).

Or maybe just export:

slug/slug.go

Line 35 in a0807d1

regexpNonAuthorizedChars = regexp.MustCompile("[^a-zA-Z0-9-_]")

so everyone could configure it themselves? I'm open to your ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants