Skip to content

Data Scrubbing Actions

Daniel Hazelbaker edited this page Aug 28, 2023 · 7 revisions

Data scrubbing is a special animal and you probably won't need to use these actions under normal circumstances. Primarily, these are to be used when you want to (attempt) to scrub your database of all personal information about your members. This is a "best guess" scrub, there is no guarantee that nothing is left and you should not send this database to anybody you don't trust implicitly.

It should be noted that only core tables are scrubbed. Any tables added by plugins are not checked. Also, if Rock updates and adds new columns they may not be checked until an update to this tool has been released.

Analytics Source Tables (Remove Data)

Completely empties the contents of all AnalyticsSource... tables. These tables contain pre-existing data from your Rock instance, but stored in a manner that speeds up data queries. There is a Rock Job that re-populates these tables so they are safe to empty.

Background Checks (Sanitize Data)

The BackgroundCheck table can sometimes contain sensitive data in the form of logged information on the requests. This first blanks out all those fields.

Next, all Attribute Values are searched for links to background check files.

Finally, we change the name of any existing background check workflow instances to match the generated name (this assumes you actually scrubbed the names too).

Background Checks (Remove Data)

The BackgroundCheck table can sometimes contain sensitive data in the form of logged information on the requests. This removes all background check data that can be found from the database.

Benevolence Requests

Cleans up all benevolence requests by replacing any government Id values with generated values (GEN#####). Any Request Text and Response Summary is replaced with Lorem Ipsum content.

Communications

Replaces the subject and message content associated with sent communications with lorem ipsum text. This includes the content that would be sent for SMS messages and push notifications.

Also replaces the response messages from people outside of Rock with lorem ipsum text.

Connection Requests

All connection opportunities get their name, public name and description replaced with lorem ipsum text.

All connection requests get their comments replaced with lorem ipsum text.

All connection request activities (those notes you add) get the note text replaced with lorem ipsum text.

Content Channel Items

This is a hard one, but basically what we attempt to do is replace every non-HTML word with lorem ipsum words instead. So if the original Content for the item was <p>Some sensitive data</p> you would end up with <p>Lorem ipsum negal</p> - hopefully. It's hard to be sure this one catches everything because content channel items are used for everything and there is really no telling what data is actually in here.

Devices

Check-in device IP addresses are replaced with random addresses (172.16.x.y), or if it is a hostname device-###.rocksolidchurchdemo.com. If the address was a 127.0.0.1 or ::1 it is left as is.

Email Addresses

Searches the database for any e-mail address and replaces it with a generated one (in the format of user####@fakeinbox.com). This searches Person records, other tables that contain known e-mail address fields and also many Attribute Value types that might contain e-mail addresses. In the case of the full-text fields a regular expression is used to find e-mail addresses.

Also replaces the global attributes EmailExceptionsList and OrganizationEmail with generated values.

Exception Log (Remove Data)

The Exception Log in Rock can sometimes be rather massive. You probably don't need all the exceptions that happened on production to be available for review on your sandbox. While you could go in and clear those out yourself, this lets you do it as part of the sweeping operation.

Financial Records

Updates various Financial... tables to remove any identifying or sensitive information. These tables contain various bits of information like partial CC numbers, name on card, saved account information and more.

Group Names

Many churches use the group leader name as the group name for small groups. This action will attempt to replace any person names from your database with new randomized names in the group name. For example, "Ted Decker's Small Group" might become "Frank Smith's Small Group".

History

Attempts to sanitize the History data with HIDDEN values. This should catch nearly all history records and hide the original values. For example, after the scrub you should only see things like Connection Status changed from HIDDEN to HIDDEN.

Interactions

This is another wild west. We simply clear the custom data form each table (ComponentData, ChannelData, and InteractionData). We also remove all session location data (geocoded details and IP addresses).

Interactive Experiences (Remove Data)

All data related to interactive experiences is removed from the database.

Location Addresses (Generate)

This action requires you to have a developer account with here.com (free up to 250,000 lookups per month). This action has a number of sub-actions it runs depending on the types of addresses found.

  • Address Locations that are not geo-coded. These addresses are just a text string and may not even be valid. Since we don't have a geo-coded location to work with, we just generate a random address.
    • If the original address is outside the US then we generate a random address from somewhere in the world.
    • If the original address is inside the US but outside the state of your church, then a random address somewhere inside the Us is generated.
    • Finally, if the original address is inside the US and also inside the same state as your church, then a random address in the Phoenix, AZ area is chosen. If this is where your church actually is, well, sorry you are out of luck.
  • Next we look for any address locations that are geo-coded and within a 35-mile radius of the address of your church - specifically, the OrganizationAddress global attribute. This will fail miserably if that address is not geocoded.
    • All matching addresses are shifted into the Phoenix, AZ area, centered around a specific point (configurable in the preferences if you need to change it). The addresses stay relative to their original location. Meaning if the address was 5.3 miles north of your church, it will be 5.3 miles north of the target center location. There is also a preference that adds a +/- 1 mile jitter to the shifted addresses so they cannot easily reversed. Once the geo-point has been shifted, we use here.com to reverse lookup from the geo-point to get an actual address. If we cannot get one, then some random address will be used instead.
  • The last set of addresses are those that are geo-coded, but outside the 35-mile radius. These addresses all remain where they are but get a 1 mile jitter applied to them. Once the jitter is applied we again do a reverse geo-code lookup to get the address of the new point.

A large database can easily have over 100,000 geo-coded addresses. Since you only get 250,000 free lookups per month, you may only get one or two runs a month before having to wait. The here.com website will show how many lookups you have performed, but it takes about 24 hours for it to update.

If your data is not going to a third party, consider using the Shuffle Location Addresses action instead.

Location Addresses (Shuffle)

This is similar to the Generate Random Location Addresses but is less stressful on the server. This method also does not require access to a developer.here.com account. Processing happens by dividing the locations into 3 different batches:

  • Address Locations that are not geo-coded.
  • Address Locations that are geo-coded and within a 35-mile radius of the address of your church - specifically, the OrganizationAddress global attribute. This will fail miserably if that address is not geocoded.
  • Address Locations that are geo-coded, but outside the 35-mile radius.

Each batch of locations has their location data randomized. Lets say you have 5 addresses in the batch. We make a list of all those id numbers (1, 2, 3, 4, 5). Then we take the first location record (number 1) and pick a random Id number from that list (it comes up 4). We then update record 4 in the database with the information from record 1. Then we remove 4 from the id number list and process the next record. So location record number 2, we pick a random number from the remaining id numbers (1, 2, 3, 5). And so on.

This means all the original address data is still in the database, but the people are no longer associated with the same addresses. This coupled with the random names action should give you a sufficient safety net for a development server.

Logins

Replaces all login usernames with "random" usernames in the format of fakeuser####. Remember, this replaces all usernames. So the username you normally use to login to Rock will be changed as well.

Media Element Data

All the names and descriptions of accounts, folders and media items are replaced with lorem ipsum text. All file thumbnail images and file links are removed.

Names

The database is searched for any person names. These names are replaced the generated fake data.

First the Person table is processed. The logic is somewhat complex, but the jist of it is that if you have 4 people in your family and you all have the same last name, then the generated last name will also be the same for each of you. If 3 of you have the same name but one has a different last name, then the 3 of you will have a shared generated last name and the last person will have a different generated last name. Also, the first names will be gender specific whenever possible (meaning if the record is marked as Male it will generate a Male name). If the original record had a nick name then it will be replaced with a generated name as well.

After the Person table has been processed, we then process benevolence requests, prayer requests, event registrations, and previous last names. Whenever possible, if the record links to the person then we use the newly generated names from the Person table - otherwise a new generated name is used.

Finally there are a few additional tables that just have a generic "name" field that we scrub and always generate a random name for.

Notes

All note text is replaced with lorem ipsum text.

Notification Messages

All notification messages have their title and description replaced with lorem ipsum text.

Organization and Campuses

Changes your global attribute values for OrganizationName, OrganizationAbbreviation and OrganizationWebsite to contain generated random data. Also changes the name of each campus to a random generated name.

If the campus has a URL value, then it gets a new generated web address. If it has a Description then the Description is replaced with Lorem Ipsum data.

Person Alternate Identifiers

Any persona alternate identifiers that are e-mail addresses get replaced with fake e-mail addresses. Other identifiers are replaced with random characters.

Person Tokens (Remove Data)

Person Tokens are what lets users login without their password, via the rckipid URL parameter. You probably don't need these on your sandbox, so this allows you to clear them out.

Phone Numbers

Any Person phone numbers are replaced with completely random phone numbers - hopefully none that actually work. After the Person Phone Number table is scrubbed, we search any attribute values that contain phone numbers using regular expression matching. Finally any generic "phone number" columns in various Rock tables are scrubbed.

The global attribute OrganizationPhone is also scrubbed.

Remote Authentication Sessions

These hold information about the remote devices that authenticated to your server. This data is scrubbed to replace IP addresses with random IP addresses.

Web Farm Data

Web Farm node names are replaced with lorem ipsum words. Log messages are replaced with lorem ipsum text.

Workflow Log

The Workflow Log can often times include sensitive data. For example, when you set an attribute value it is logged (assuming you have logging turned on). This action attempts to scrub those messages and replace the actual action message with HIDDEN.