Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose new search attribute APIs #67

Closed
wants to merge 2 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions typescript/search-attributes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
## Search attributes V2

### Why do we need this?

In the current version of the SDK, search attributes are always arrays. In order for the server to support visibility
on top of standard persistence, we need to redo the search attributes API to enforce single values for all search
attributes except for keywords. The server will eventually drop support for multiple search attribute values for
non-keywords.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could maintain the array & support for multiple values when backed by ES, while supporting single-value arrays for standard persistence. There are use cases for it. Was it decided that there are not enough use cases to warrant the complexity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want different experience based on the backend.
Research shows that multiple values are almost never used.


### Minimal proposal

Make the E2E experience consistent, including query construction for listing workflows and search attribute
registration.

`workflows.ts`

```ts
import { searchAttributes } from '@temporalio/common';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does searchAttributes look like? Can I iterate all of them inside the workflow? Can I do things like unset all of the ones with a certain prefix? (i.e. a replace-all)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a set of functions and constant, it includes things like:

defineAttribute
WorkflowType // built-in attribute

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I iterate all of them inside the workflow? Can I do things like unset all of the ones with a certain prefix? (i.e. a replace-all)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd have to define your own set of search attributes, it's up to you how to access those.
We can expose an API to list the set of search attributes, do you think it's valuable?
Since search attributes have to be registered, I don't know if there's a real need for it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeScript doesn't allow object destructuring on imports statements. That means that one would always have to write searchAttributes.defineAttribute(...), instead of simply defineAttribute(...).

I think defineAttribute should be outside of this object, and searchAttributes should contain only build-in search attributes...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like 2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SearchAttributes is essentially an enum, from user's POV

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like 2 as well.
Not sure why SearchAttributes are an enum..

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "built in search attributes" should be grouped together. Not formally an enum, but an object on which every field is a builtin search attribute.

Having an export named simply WorkflowType gives the impression that it is a type. Writing it with a lower case makes it only marginally better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, maybe have a builtins constant with all of the builtin search attributes.


export const customerAttribute = searchAttributes.defineAttribute(searchAttributes.Text, 'customer');
export const dateOfPurchaseAttribute = searchAttributes.defineAttribute(Date, 'dateOfPurchase');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might drop define here. You're not really defining it in the verb sense, especially if it already exists, rather you are referring to some existing definition. Perhaps attributeDefinition then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine with just attribute TBH.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like the verb define for consistency with defineQuery and defineSignal.


export async function myWorkflow() {
// Get the value of a search attribute - returns string | undefined here
const customer = customerAttribute.value();
// ... random workflow logic ...

// Set the value of a search attribute, all set and unset calls are buffered and flushed out as a single command at
// the end of an activation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and unset calls are buffered and flushed out as a single command at the end of an activation

I assume last write wins when altering the same attribute before next command?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought that was obvious but I can add a comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was mostly obvious, no comment needed

dateOfPurchaseAttribute.set(new Date());
// Explicit unset API
dateOfPurchaseAttribute.unset();
// ...
}
```

`client.ts`

```ts
import { searchAttributes } from '@temporalio/common';
import { Client, Q } from '@temporalio/client';
import { customerAttribute, dateOfPurchaseAttribute, myWorkflow } from './workflows';

const client = new Client(/* options */);

// Only available in OSS for now
await client.operator.registerSearchAttributes(customerAttribute, dateOfPurchaseAttribute);
// Other methods will be exposed too

const iterator = client.workflow.list({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question as I start to work on this in Python. So this solves temporalio/features#75 too? I assume this is interceptable? Also, can you show a quick example of status filtering? Is ExecutionStatus a search attribute?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will solve the high level SDK list API, assuming you want to add the query builder (you should IMHO).
I don't see much value in making it intercept-able if there are no headers to pass to the server.
ExecutionStatus along with the rest of the default search attributes will be defined in the SDK.

For search attributes that we know are enums I'd definitely restrict the type (I think it's only ExecutionStatus).

Something like:

Q.filter(builtinSearchAttributes.ExecutionStatus.eq('RUNNING'))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much value in making it intercept-able if there are no headers to pass to the server.

While I thought similar, I was overridden on workflow handle "describe". I think we are now forced to be consistent here.

query: Q.filter(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does Q.filter produce a filter query as a string or an object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object

// WorkflowType is a pre-defined search attribute
Q.and(searchAttributes.WorkflowType.eq(myWorkflow.name), customerAttribute.eq('customer-id-1234')).orderBy(
mjameswh marked this conversation as resolved.
Show resolved Hide resolved
dateOfPurchaseAttribute,
'ASC' // Optionally specify DESC | ASC
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks nice but I wouldn't underestimate the complexity of a query builder. This isn't a small amount of work to implement properly, especially if it's type-safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already built most of it while designing this and it's not my first time building one of these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it's not about difficulty of building, it's about maintenance and compatibility. How well defined are list filters? If you can give me a grammar that you guarantee won't change in an incompatible way, then I can support a programmatic builder. Even then though, though there's some type safety, you'll end up where SQL query builders always end up where people will escape back into string literals for extreme flexibility.

Also, this API needs to be fledged out some more. Why Q when it's a "list filter" in our docs? Why not a builder pattern? How can I embed a custom query string? Why is "orderBy" on a "query" part of something called a "filter"? (granted that's a Temporal naming problem).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with the server team and they've confirmed that the entire list of capabilities is documented here: https://docs.temporal.io/concepts/what-is-a-list-filter.

I'm not proposing this as the only way, The list API will accept a structured query and a plain string, we can of course make the query builder accept strings too but I don't think we need that at this point.

The doc page mentions list filters but it actually talks about queries, while the projection part of the query is omitted in Temporal, the rest is a query, it's composed of filters and ordering operators.
filter is not a what you get when you use Q.filter, it's just a factory for creating a new query with a single filter.
Alternatively, we could have a Q() method or constructor instead, I'm open to discussing that.

Essentially Q is a builder.

You can compose multiple operations:

Q.filter(...).filter(...).orderBy(...).orderBy(...)

NOTE that queries are immutable and filter and orderBy return new instances.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this builder solving a need people have?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoiding typos

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, to clarify I mean as in a struggle people have today. They can typo lots of strings, e.g. task queues, I am just wondering if anyone is clamoring for this.

Also, can we get an exact query grammar? (if not in this issue, just in docs somewhere)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc page I linked above.

I don't think people are clamoring for this, I just this it's a general improvement and is pretty straightforward to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not "clamoring", but I would have appreciated a query builder for the one spot in the server where I had to construct a query for internal use. Mostly just so I don't have to think about quoting or escaping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll clamor to avoid thinking about quotes in quotes 😊

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can constructing a filter always succeed? or should we have a users call a .build() method at the end to return any errors in the construction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They always succeed or throw on construction.

),
pageSize: 100, // optional
});

// Returns something like the TS wrapper returned by Describe - the API returns
// temporal.api.workflow.v1.WorkflowExecutionInfo
for await (const { workflowId, ...rest } of iterator) {
const handle = client.workflow.getHandle(workflowId);
await handle.cancel();
}

const handle = client.getHandle(someWorkflowId);
const info = await handle.describe();
const customer = info.getAttribute(customerAttribute); // includes runtime validation - returns string | undefined
// -- OR --
const customer = info.getAttribute<string>(customer); // no runtime validation - returns string | undefined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this argument supposed to be 'customer' ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for catching it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should standardize on the word "search attribute" or "attribute".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should probably just stick to search attribute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you'll do the same for memo here and for the in-workflow part? (though memo can be any type and is not part of your filter work)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to do this for memo, but I do think that the "upsert memo" API in the workflow module will do batching as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting a single memo value by key for a specific type makes sense though, right? info.getMemo(myStringMemo) and info.getMemo<string>("my-string-memo")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. right now memos can be any user object so we can't do runtime validation like we do for search attributes.
ATM users can access their memo as: info.memo['my-string-memo'] as string.

Adding the getter doesn't provide that much value IMHO but I'll consider adding it for API consistency with search attributes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added workflow.memo and workflow.memo_value (and the client handle describe response equivalents) after discussion because there is no simple way to deserialize a memo value without manually invoking a converter.

No need to add it now, but I think making memo and search attributes APIs match as much as possible makes sense (though obviously the latter is much more limited on types).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we had type hints in TS I'd do it, but we currently don't support that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we mark this new API experimental until implemented in another SDK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should mark any new APIs experimental for a certain period IMHO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const customer = info.getAttribute<string>(customer); // no runtime validation - returns string | undefined
const customer = info.getAttribute<string>(customerAttribute); // no runtime validation - returns string | undefined
const customer = info.getAttribute('customer'); // no runtime validation - returns unknown

```

### What about the current APIs?

Search attributes are exposed in the SDK today in 3 places:

1. In the client - `WorkflowClient.describe()` return value (`WorkflowExecutionDescription.searchAttributes`)
2. In the workflow - `workflowInfo()` return value (`WorkflowInfo.searchAttributes`)
3. `upsertSearchAttributes` call from the workflow

We'll deprecate all of these APIs and point to the alternatives presented above.

If `upsertSearchAttributes` is called in the same activation after as the new `set|unset` APIs are used, it will trigger
flushing the buffered changes.