Skip to content
Frank Koehl edited this page Aug 29, 2022 · 11 revisions

Extensions to String

String#dedupe

dedupe(str)

Find multiple concurrent occurrences of a character and reduce them to a single occurrence.

'hello___world'.dedupe('_')
# => 'hello_world'

'/crazy//concatenated////file/path'.dedupe('/')
# => '/crazy/concatenated/file/path'

You can dedupe multiple characters by passing them all together within a single string.

'foo___bar_baz---bing'.dedupe('-_')
# => 'foo_bar_baz-bing'

dedupe won't automatically strip leading or trailing characters. You'll want to combine it with strip_all to do that.

'/crazy//concatenated////file/path/'.dedupe('/')
# => '/crazy/concatenated/file/path/'

'/crazy//concatenated////file/path/'.dedupe('/').strip_all('/')
# => 'crazy/concatenated/file/path'
Bang variant

dedupe! will perform the modifications in place, rather than returning a copy.

String#keyify

Sometimes we find ourselves in need of a codified version of a string value. For example, user-generated values that must be compared for basic sameness, or creating database keys based on user-driven data entry. We use keyify in these situations to normalize the string down into a handy code for these comparison and data storage purposes.

keyify will perform the following actions...

  1. Replace all non-alphanumerics with underscores
  2. Convert any existing CamelCase into snake_case
  3. Strip any leading numbers and underscores
  4. Combine multiple concurrent underscores into a single one
  5. Convert to lowercase
  6. Return as a symbol
'FooBarBaz'.keyify
# => :foo_bar_baz

"Foo-Bar'Baz".keyify
# => :foo_bar_baz

'1234FooBAR'.keyify
# => :foo_bar

# Works with symbols as well
:FooBarBaz.keyify
# => :foo_bar_baz

Say a person's name is entered into a system by two different people, and we must now compare the values to see if they match. We all know user-entered data sucks, hopefully keyify can make it suck just a little less.

'John Doe'.keyify
# => :john_doe

'JOHN   DOE'.keyify
# => :john_doe

'John Doe'.keyify == 'JOHN   DOE'.keyify
# => true

"Ted O'Baxter".keyify == 'Ted O Baxter'.keyify
# => true

How about a dropdown menu populated with options created by end users? An identifier other than the database's primary key can often be useful.

'Not a covered benefit'.keyify
# => :not_a_covered_benefit

"User's Duplicate Claim".keyify
# => :user_s_duplicate_claim

"Included in global amount/bundled".keyify
# => :included_in_global_amount_bundled

In case you need something from the Ruby-verse, keyify also works on static class declarations.

Integer.keyify
# => :integer

Math::DomainError.keyify
# => :math_domain_error

It also makes it easy to build a hash with keys based on string values.

my_hash = {}
['Option A', 'Option B', 'Option C', 'Option D'].each do |opt|
  my_hash[opt.keyify] = opt
end

my_hash
# => {:option_a=>"Option A", :option_b=>"Option B", :option_c=>"Option C", :option_d=>"Option D"}
Bang variant

The keyify! version performs the same actions, but will raise an ArgumentError if the value being keyified results in an empty string.

'  '.keyify!
# => ArgumentError: "  " cannot be keyified, no valid characters

'!@#$%^'.keyify!
# => ArgumentError: "!@#$%^" cannot be keyified, no valid characters

'12345678'.keyify!
# => ArgumentError: "12345678" cannot be keyified, no valid characters

String#slugify

This behaves exactly like String#keyify, with some exceptions:

  1. Underscores _ are replaced with dashes -
  2. Leading numbers are permitted (we remove them in keyify because symbols can't start with a number)
  3. Values are returned as a String, instead of a Symbol
'FooBarBaz'.slugify
# => 'foo-bar-baz'

"Foo-Bar'Baz".slugify
# => 'foo-bar-baz'

'1234FooBAR'.slugify
# => '1234-foo-bar'

# Accepts Symbols
:FooBarBaz.slugify
# => 'foo-bar-baz'

# Also accepts static classes
Math::DomainError.slugify
# => 'math-domain-error'

There is also a matching String#slugify! bang variant that follows the same rules as its keyify! counterpart.

String#match?

Ruby's match method is often used in boolean operations to determine the presence or absence of a given pattern within a string. That's why we found it odd that Ruby doesn't include a shortcut method to return a boolean result.

match? operates exactly like match, and simply returns true or false based on the results of the lookup.

'hello'.match?('he')
# => true

'hello'.match?('o')
# => true

'hello'.match?('(.)')
# => true

'hello'.match?(/(.)/)
# => true

'hello'.match?('xx')
# => false

'hello'.match?('he', 1)
# => false

String#nl2br

Converts newlines in a string into break tags. Will recognize Unix line feed (\n), standalone carriage returns (\r), and Windows formats (both \r\n and the improperly formatted \n\r).

A Unix newline is appended immediately following each break tag replacement.

"\n".nl2br
# => "<br />\n"

"\n\r".nl2br
# => "<br />\n"

"\r\n".nl2br
# => "<br />\n"

"\n\r\n".nl2br
# => "<br />\n<br />\n"

"\r\n\r\n".nl2br
# => "<br />\n<br />\n"

"\r\r\n".nl2br
# => "<br />\n<br />\n"

"\r\r".nl2br
# => "<br />\n<br />\n"

"\n\r\r".nl2br
# => "<br />\n<br />\n"

String#newline_to

Same parsing logic as nl2br, but accepts a replacement string as an argument. Defaults to a space ( ).

# Handles all the different break styles, just like nl2br
"Let's play Global Thermonuclear War.\n\rA strange game.\n\nThe only winning move is not to play.\r\nHow about a nice game of chess?\r".newline_to
# => "Let's play Global Thermonuclear War. A strange game.  The only winning move is not to play. How about a nice game of chess? "

Comes in real handy when a reducing a multiline input to a single line.

# imagine a series of ID's coming from a textarea form input
"10001\n10002\n10003\n10004\n10005\n10006".newline_to(',')
# => "10001,10002,10003,10004,10005,10006"

Combine with strip_all to clean up leading and trailing characters

"\n\n10001\n10002\n10003\n10004\n10005\n10006\n".newline_to(',').strip_all(',')
# => "10001,10002,10003,10004,10005,10006"

You can even add in dedupe for really messy cleanups!

# note that we have a mix of newlines and commas
"\n\n10001\n10002\n10003,\n10004\n\n10005,,10006\n".newline_to(',').strip_all(',').dedupe(',')
# => "10001,10002,10003,10004,10005,10006"

A bang variant will perform replacements on the object, rather than returning a new one.

txt = "Don't
need
multiple
lines"

txt.newline_to!
txt
# => "Don't need multiple lines"

String#remove_whitespace

Removes all the whitespace from a string. No muss, no fuss.

'   a b c d     e'.remove_whitespace
# => 'abcde'

# Absolutely any string is valid
'. $ ^ { [ ( " | " ) * + ?'.remove_whitespace
# => '.$^{[("|")*+?'

There is a bang variant to perform the removal in place, rather than returning a new object

str = 'a  b    c d  e'
str.remove_whitespace!
str
# => 'abcde'

String#replace_whitespace

Replace whitespace with the given string.

'1 2 3 4 5'.replace_whitespace('+')
# => '1+2+3+4+5'

There's also a bang variant to replace on the current object.

str = 'a b c d e'
str.replace_whitespace!('-')
str
# => 'a-b-c-d-e'

String#strip_all

Ruby's strip method removes leading and trailing whitespace, but there's no method to strip other characters like dashes, underscores, or numbers. strip_all allows you to perform these kinds of cleanups without having to write any regular expressions.

The lone argument is a string of the characters you want to remove. By default, strip_all will remove dashes - and underscores _.

'___foo___'.strip_all
# => 'foo'

'---foo---'.strip_all
# => 'foo'

Note that the argument is processed as a regex group (your argument ends up inside of a regex []). This means we evaluate the individual characters of the argument, not an explicit character sequence. You do not need spaces between the characters.

'__-_--foo--_-__'.strip_all
# => 'foo'

'123foo123'.strip_all('321')
# => 'foo'

'xXxXfooXxXx'.strip_all('XYZx')
# => 'foo'

Case-sensitivity still applies.

'ABCfooABC'.strip_all('abc')
# => 'ABCfooABC'

strip_all is intended to be a drop-in enhancement of strip, and will therefore always remove whitespace and newlines, even when providing your own set of characters.

"////   foo   ////\n".strip_all('/')
# => 'foo'

Everything passed in is escaped by default, so you don't have to worry about symbols.

'/[a|valid|regex]+/'.strip_all('/[]+|')
# => 'a|valid|regex'

# The | pipes are still present because they are not leading or trailing in this string.
# Remember, we're enhancing the strip method.

The one exception is when you pass in regex character ranges: 0-9, a-z, and A-Z. Those will be read as expressions to capture all numbers, all lowercase or all uppercase letters, respectively.

'0123456789   foo   9876543210'.strip_all('0-9')
# => 'foo'

'FOO  314   BARBAZ'.strip_all('A-Z')
# => '314'

'hello--314--world'.strip_all('a-z')
# => '--314--'

'hello--314--world'.strip_all('a-z-') # note the extra dash at the end
# => '314'

# you can really shoot yourself in the foot if you're not careful
'hello world'.strip_all('a-z')
# => ''

'abcdefghijklm   foo123   nopqrstuvwxyz'.strip_all('a-z0-9')
# => ''
Variants

We provide the same set of associated methods as strip.

  • lstrip_all removes only leading characters
  • rstrip_all removes only trailing characters
  • All three have bang variants -- strip_all!, lstrip_all!, and rstrip_all! -- that perform the replacement in place, rather than returning a copy.

String#to_uuid

Format a string containing a UUID with dashes following standard UUID display format.

require 'securerandom'

str = SecureRandom.uuid
# => "cab93178-ba44-438c-85b1-49fed89e8a6f"
unformatted = str.delete('-')
# => "cab93178ba44438c85b149fed89e8a6f"
unformatted.to_uuid
# => "cab93178-ba44-438c-85b1-49fed89e8a6f"

The method expects a string with...

  1. Only hexadecimal characters: 0-9, A-F
  2. Exact length of 32 characters, which is a UUID without dashes

Everything else will throw an ArgumentError

# yolo is not found in hexadecimal
"c6cabb4a709a4a448114d2b02132yolo".to_uuid
# => ArgumentError: "c6cabb4a709a4a448114d2b02132yolo" is not a valid UUID

# too long, 33 characters
"a99fee17da4f45128f1dd482c686f1a49".to_uuid
# => ArgumentError: "a99fee17da4f45128f1dd482c686f1a49" is not a valid UUID

"some random string".to_uuid
# => ArgumentError: "some random string" is not a valid UUID

If the UUID is already properly formatted, it will return the unmodified value

"5a658838-7aa4-488c-98f1-21805a5d2021".to_uuid
# => "5a658838-7aa4-488c-98f1-21805a5d2021"

# Only dashes are accepted as separators
"5a658838 7aa4 488c 98f1 21805a5d2021".to_uuid
# => ArgumentError: "5a658838 7aa4 488c 98f1 21805a5d2021" is not a valid UUID