Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

randgen: add PopulateTableWithRandData #75677

Merged
merged 1 commit into from
Feb 15, 2022

Conversation

msbutler
Copy link
Collaborator

@msbutler msbutler commented Jan 28, 2022

PopulateTableWithRandData populates the caller's table with random data. This helper
function aims to make it easier for engineers to develop randomized tests that
leverage randgen / sqlsmith.

Informs #72345

Release note: None

@msbutler msbutler requested review from mgartner and a team January 28, 2022 22:51
@msbutler msbutler self-assigned this Jan 28, 2022
@msbutler msbutler requested a review from a team as a code owner January 28, 2022 22:51
@cockroach-teamcity
Copy link
Member

This change is Reviewable

pkg/sql/randgen/schema.go Outdated Show resolved Hide resolved
pkg/sql/randgen/schema.go Outdated Show resolved Hide resolved
@msbutler msbutler force-pushed the butler-rand-data branch 6 times, most recently from 5ba2007 to bf834a2 Compare January 31, 2022 18:07
Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks useful! Just some drive-by comments.

pkg/sql/randgen/schema.go Outdated Show resolved Hide resolved
pkg/sql/randgen/schema.go Outdated Show resolved Hide resolved
pkg/sql/randgen/schema.go Outdated Show resolved Hide resolved
pkg/sql/randgen/schema.go Outdated Show resolved Hide resolved
pkg/sql/randgen/schema.go Outdated Show resolved Hide resolved
pkg/sql/randgen/schema.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 6 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msbutler and @stevendanna)


bazel-out, line 1 at r3 (raw file):

/private/var/tmp/_bazel_michaelbutler/13ba282fa2b19539d0c969c1113bb37c/execroot/cockroach/bazel-out

Is this file necessary, or did you commit it by error?


pkg/sql/randgen/schema.go, line 221 at r3 (raw file):

		if colTypes[j].Family() == types.OidFamily {
			// choose 0 or 1 as the OID value as the value must be less than the
			// number of tables in the database.

As far as I can tell, there is only a limit on REGTYPE, not any other types in the OID family (OID, REGCLASS, REGPROC, REGPROCEDURE, REGROLE, REGNAMESPACE).


pkg/sql/randgen/schema.go, line 231 at r3 (raw file):

	}
	valBuilder.WriteString(")")
	return valBuilder

nit: might as well return a string instead of the builder


pkg/sql/randgen/schema.go, line 237 at r3 (raw file):

func PopulateTableWithRandData(rng *rand.Rand, db *gosql.DB, tableName string, numRows int) error {
	var ignored, createStmtSQL string
	res := db.QueryRow(fmt.Sprintf("SHOW CREATE TABLE %s", tableName))

nit: You don't need ignored if you change the query to SELECT create_statement FROM [SHOW CREATE TABLE %].


pkg/sql/randgen/schema.go, line 284 at r3 (raw file):

			} else {
				nullable = append(nullable, false)
			}

nit: this can be simplified to nullable = append(nullable col.Nullable.Nullability == tree.Null).


pkg/sql/randgen/schema.go, line 295 at r3 (raw file):

		fail    int // number of failed insert statements
	)
	maxTries := numRows * 10

I'm a little worried that trying only 10 times per row will cause the tests to flake. Have you tried stressing the test to see how likely it is to fail?


pkg/sql/randgen/schema.go, line 309 at r3 (raw file):

			if fail > maxTries {
				// This could mean PopulateTableWithRandomData or RandDatum couldn't
				// handle this table's schema. consider filing a bug.

nit: consider => Consider


pkg/sql/randgen/schema.go, line 310 at r3 (raw file):

				// This could mean PopulateTableWithRandomData or RandDatum couldn't
				// handle this table's schema. consider filing a bug.
				return errors.Errorf(`could not populate %d rows for table with schema \n \t %s \n--

This function isn't being used yet, so I'm not sure what the usage pattern will be, but I'm wondering if it would make more sense to return no error in this case. Perhaps a better API would be something like:

func PopulateTableWithRandData(
    rng *rand.Rand,
    db *gosql.DB,
    tableName string,
    numInserts int,
) (numRowsInserted int, err error)

The function would attempt exactly numInserts INSERT statements, none of which are guaranteed to succeed. It would return the number of rows successfully inserted. An error is returned if no inserts could be attempted (the table doesn't exist, or the create statement failed to parse). I think this would make the behavior of this function more predictable and let callers decide how many inserts they want to perform and how to proceed given the number of successfully inserted rows.

Copy link
Collaborator Author

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @mgartner and @stevendanna)


pkg/sql/randgen/schema.go, line 221 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

As far as I can tell, there is only a limit on REGTYPE, not any other types in the OID family (OID, REGCLASS, REGPROC, REGPROCEDURE, REGROLE, REGNAMESPACE).

Done.


pkg/sql/randgen/schema.go, line 295 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I'm a little worried that trying only 10 times per row will cause the tests to flake. Have you tried stressing the test to see how likely it is to fail?

I got it to pass the stress test a few times with a multiplier of 10, though I'll bump it to 100. In a previous iteration of this test, I was hitting OOM errors on the stress test, which compelled me to lower the multiplier, but a lot has changed since I was OOMing.


pkg/sql/randgen/schema.go, line 310 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

This function isn't being used yet, so I'm not sure what the usage pattern will be, but I'm wondering if it would make more sense to return no error in this case. Perhaps a better API would be something like:

func PopulateTableWithRandData(
    rng *rand.Rand,
    db *gosql.DB,
    tableName string,
    numInserts int,
) (numRowsInserted int, err error)

The function would attempt exactly numInserts INSERT statements, none of which are guaranteed to succeed. It would return the number of rows successfully inserted. An error is returned if no inserts could be attempted (the table doesn't exist, or the create statement failed to parse). I think this would make the behavior of this function more predictable and let callers decide how many inserts they want to perform and how to proceed given the number of successfully inserted rows.

I like this idea! Then, schema_test will pass if at least one of the 10 randomly generated tables had a numRowsInserted>0.


bazel-out, line 1 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Is this file necessary, or did you commit it by error?

whoops. removed.

PopulateRandTable populates the caller's table with random data. This helper
function aims to make it easier for engineers to develop randomized tests that
leverage randgen / sqlsmith.

I considered adding random insert statements into sqlsmith's randtables setup,
however the high probably of a faulty insert statement would cause the whole
setup to fail. See cockroachdb#75159

In the future, I'd like to develop a new helper function
PopulateDatabaseWithRandData which calls PopulateTableWithRandData on each
table in the order of the fk dependency graph.

Informs cockroachdb#72345

Release note: None
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner and @stevendanna)

@msbutler
Copy link
Collaborator Author

TFTR!!!

bors r=mgartner

@craig
Copy link
Contributor

craig bot commented Feb 15, 2022

Build failed:

@msbutler
Copy link
Collaborator Author

bors r=mgartner

@craig
Copy link
Contributor

craig bot commented Feb 15, 2022

Build succeeded:

@craig craig bot merged commit 9132eac into cockroachdb:master Feb 15, 2022
@msbutler msbutler deleted the butler-rand-data branch February 15, 2022 17:43
@msbutler msbutler changed the title randgen: add PopulateRandTable randgen: add PopulateTableWithRandData Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants