Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Configurable list of json fields to mine patterns #14528

Merged
merged 5 commits into from
Oct 21, 2024

Conversation

salvacorts
Copy link
Contributor

@salvacorts salvacorts commented Oct 18, 2024

What this PR does / why we need it:

This PR makes the hardcoded list of json fields to mine patterns for configurable. This would allow us to:

  • Add more as we learn it fits without having to release a new version
  • Support customers with legit requests to add one of their fields
  • Remove fields if we see any yield unuseful patterns for a customer

This list is now configurable via the following overrides:

  • pattern_ingester_tokenizable_json_fields_default: It takes a comma-separated list of fields. By default we set it to the currently hardcoded list: log,message,msg,msg_,_msg,content.
  • pattern_ingester_tokenizable_json_fields_append: Also comma-separated. Appends content to default list.
  • pattern_ingester_tokenizable_json_fields_delete: Comma-separated. Deletes keys from (default U append).

The docs are hidden.

Special notes for your reviewer:

  • As far as I could tell, the tenant is passed down to the pattern.stream via the instanceID of the pattern.instance which in turns comes from the pattern.Ingester at GetOrCreateInstance where the instanceID comes from the tenantID in the ctx.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • Title matches the required conventional commits format, see here
    • Note that Promtail is considered to be feature complete, and future development for logs collection will be in Grafana Alloy. As such, feat PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@salvacorts salvacorts changed the title Configurable list of json fields to mine patterns feat: Configurable list of json fields to mine patterns Oct 18, 2024
@salvacorts salvacorts marked this pull request as ready for review October 18, 2024 11:48
@salvacorts salvacorts requested a review from a team as a code owner October 18, 2024 11:48
@@ -418,6 +420,9 @@ func (l *Limits) RegisterFlags(f *flag.FlagSet) {
f.IntVar(&l.BlockIngestionStatusCode, "limits.block-ingestion-status-code", defaultBlockedIngestionStatusCode, "HTTP status code to return when ingestion is blocked. If 200, the ingestion will be blocked without returning an error to the client. By Default, a custom status code (260) is returned to the client along with an error message.")

f.IntVar(&l.IngestionPartitionsTenantShardSize, "limits.ingestion-partition-tenant-shard-size", 0, "The number of partitions a tenant's data should be sharded to when using kafka ingestion. Tenants are sharded across partitions using shuffle-sharding. 0 disables shuffle sharding and tenant is sharded across all partitions.")

_ = l.PatternIngesterTokenizableJSONFields.Set("log,message,msg,msg_,_msg,content")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we need to redefine all of these in the limits if we want to add a new one alongside the existing ones?

Is the default list part of the doc/description so it is easy to reference in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a default, append and delete lists. So we can better customize the default list per tenant.

@@ -91,7 +91,7 @@ func TestDrain_TrainExtractsPatterns(t *testing.T) {
},
},
{
drain: New(DefaultConfig(), "", nil),
drain: New("", DefaultConfig(), &fakeLimits{}, "", nil),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: define a tenantID constant for this, so its obvious what is being passed here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@salvacorts salvacorts merged commit 7050897 into main Oct 21, 2024
60 checks passed
@salvacorts salvacorts deleted the salvacorts/configurable_tenant_json_pattern_fields branch October 21, 2024 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants