Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to-close: added long-running materialization support to wren #374

Closed
wants to merge 3 commits into from

Conversation

epinzur
Copy link
Collaborator

@epinzur epinzur commented May 23, 2023

No description provided.

@github-actions github-actions bot added the wren label May 23, 2023
preparedFile.Path = preparedFile.Path
subLogger.Debug().Interface("prepared_file", preparedFile).Msg("these paths should be URIs")
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code does nothing, removing it

ReconcileMaterialzations(ctx context.Context) error
RunMaterializations(ctx context.Context, owner *ent.Owner)
StartMaterialization(ctx context.Context, owner *ent.Owner, materializationID string, compileResp *v1alpha.CompileResponse, destination *v1alpha.Destination) error
StopMaterialization(ctx context.Context, materializationID string) error
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-org and added a few methods in the materialization section

@@ -511,7 +515,7 @@ func (m *Manager) processMaterializations(requestCtx context.Context, owner *ent
return nil
}

materializations, err := m.materializationClient.GetAllMaterializations(ctx, owner)
materializations, err := m.materializationClient.GetMaterializationsBySourceType(ctx, owner, materialization.SourceTypeFiles)
Copy link
Collaborator Author

@epinzur epinzur May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this kicks off the old-style materializations managed by wren running compute at data-load time. Now only do this on materializations that are based on file sources.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could rip this code out at some point soon. Maybe we make a note somewhere to do so

"github.com/kaskada-ai/kaskada/wren/ent"
"github.com/kaskada-ai/kaskada/wren/ent/materialization"
"github.com/rs/zerolog/log"
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrappers around calling the new methods in the compute engine for managing long-lived materializations

return statusResponse.Progress, nil
}

func (m *Manager) ReconcileMaterialzations(ctx context.Context) error {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running this should make sure that all materializations in the DB are running in the compute engine

Copy link
Collaborator

@jordanrfrazier jordanrfrazier May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this as a comment - or can we choose a more descriptive name?
I see it's being run periodically, can you clarify why?

@@ -104,6 +104,7 @@ func convertKaskadaTableToComputeTable(kaskadaTable *ent.KaskadaTable) *v1alpha.
TimeColumnName: kaskadaTable.TimeColumnName,
GroupColumnName: kaskadaTable.EntityKeyColumnName,
Grouping: kaskadaTable.GroupingID,
Source: kaskadaTable.Source,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass table source along to compute

@@ -31,6 +31,7 @@ func (Materialization) Fields() []ent.Field {
field.Bytes("slice_request").GoType(&v1alpha.SliceRequest{}).Immutable(),
field.Bytes("analysis").GoType(&v1alpha.Analysis{}).Immutable(),
field.Int64("data_version_id"),
field.Enum("source_type").Values("unspecified", "files", "streams").Default("unspecified"),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a materialization source_type to the db

return materializations, nil
}

func (c *materializationClient) GetAllMaterializationsBySourceType(ctx context.Context, sourceType materialization.SourceType) ([]*ent.Materialization, error) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, but for all owners/clients instead of just one

return nil, customerrors.NewInvalidArgumentErrorWithCustomText("cannot materialize tables from different source types")
}
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protection around creating materializations with mixed source types

switch sourceType {
case materialization.SourceTypeFiles:
subLogger.Debug().Msg("running materializations")
s.computeManager.RunMaterializations(ctx, owner)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if is file-backed materialization, run as before

s.computeManager.RunMaterializations(ctx, owner)
case materialization.SourceTypeStreams:
subLogger.Debug().Msg("adding materialization to compute")
err := s.computeManager.StartMaterialization(ctx, owner, createdMaterialization.ID.String(), compileResp, createdMaterialization.Destination)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if is stream backed materialization, attempt to start it on compute

})
Expect(err).Should(HaveOccurred())
Expect(err.Error()).Should(Equal("cannot materialize tables from different source types"))
Expect(response).Should(BeNil())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test of protection around materializing from multiple source types

@epinzur epinzur changed the title draft: add long-running materialization support to wren added long-running materialization support to wren May 23, 2023
store: *objectStoreClient,
tableStore: tableStore,
tr: otel.Tracer("ComputeManager"),
runningMaterailizations: map[string]struct{}{},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling

@@ -52,22 +53,25 @@ type Manager struct {
store client.ObjectStoreClient
tableStore store.TableStore
tr trace.Tracer

runningMaterailizations map[string]struct{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling

wren/compute/interface.go Outdated Show resolved Hide resolved
@@ -511,7 +515,7 @@ func (m *Manager) processMaterializations(requestCtx context.Context, owner *ent
return nil
}

materializations, err := m.materializationClient.GetAllMaterializations(ctx, owner)
materializations, err := m.materializationClient.GetMaterializationsBySourceType(ctx, owner, materialization.SourceTypeFiles)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could rip this code out at some point soon. Maybe we make a note somewhere to do so

func (m *Manager) StartMaterialization(ctx context.Context, owner *ent.Owner, materializationID string, compileResp *v1alpha.CompileResponse, destination *v1alpha.Destination) error {
subLogger := log.Ctx(ctx).With().Str("method", "manager.StartMaterialization").Str("materialization_id", materializationID).Logger()

tables, err := m.getMaterializationTables(ctx, owner, compileResp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we start moving away from Tables and preferring Input or Source?

return statusResponse.Progress, nil
}

func (m *Manager) ReconcileMaterialzations(ctx context.Context) error {
Copy link
Collaborator

@jordanrfrazier jordanrfrazier May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this as a comment - or can we choose a more descriptive name?
I see it's being run periodically, can you clarify why?

return err
}

newRunningMaterailizations := make(map[string]struct{})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spelling

}

if isRunning {
newRunningMaterailizations[materializationID] = struct{}{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, what is the struct{}{} holding?

}
}

// find all materializations that were running in the previous iteration but don't exist anymore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this logic? What is a previous iteration in this context

if err != nil {
return nil, err
}

err = s.materializationClient.DeleteMaterialization(ctx, owner, materialization)
if foundMaterialization.SourceType == materialization.SourceTypeStreams {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume in the future we'll also have a way to have a long-running file backed materialization, so maybe a todo here

@epinzur epinzur changed the title added long-running materialization support to wren to-close: added long-running materialization support to wren May 25, 2023
@epinzur epinzur closed this May 25, 2023
@epinzur epinzur deleted the wren/pulsar2 branch May 25, 2023 16:49
epinzur added a commit that referenced this pull request May 25, 2023
@jordanrfrazier this is a new branch than the previous PR:
#374 which I'm going to close.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants