Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subscriber: add Targets filter, a lighter-weight EnvFilter #1550

Merged
merged 12 commits into from
Sep 12, 2021
364 changes: 364 additions & 0 deletions tracing-subscriber/src/filter/directive.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,364 @@
use crate::filter::level::{self, LevelFilter};
use std::{cmp::Ordering, error::Error, fmt, iter::FromIterator, str::FromStr};
use tracing_core::Metadata;
/// Indicates that a string could not be parsed as a filtering directive.
#[derive(Debug)]
pub struct ParseError {
kind: ParseErrorKind,
}
/// A directive which will statically enable or disable a given callsite.
///
/// Unlike a dynamic directive, this can be cached by the callsite.
#[derive(Debug, PartialEq, Eq, Clone)]
pub(crate) struct StaticDirective {
pub(in crate::filter) target: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why do you prefer pub(in crate::filter) to `pub(super)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I try to avoid using super except in modules that are nested in the same file (like tests modules), because it's nice to be able to move the file around in the module tree and not have to fix imports and visibility modifiers.

pub(in crate::filter) field_names: FilterVec<String>,
pub(in crate::filter) level: LevelFilter,
}

#[cfg(feature = "smallvec")]
pub(in crate::filter) type FilterVec<T> = smallvec::SmallVec<[T; 8]>;
#[cfg(not(feature = "smallvec"))]
pub(in crate::filter) type FilterVec<T> = Vec<T>;

#[derive(Debug, PartialEq, Clone)]
pub(in crate::filter) struct DirectiveSet<T> {
directives: FilterVec<T>,
pub(in crate::filter) max_level: LevelFilter,
}

pub(in crate::filter) trait Match {
fn cares_about(&self, meta: &Metadata<'_>) -> bool;
fn level(&self) -> &LevelFilter;
}

#[derive(Debug)]
enum ParseErrorKind {
Field(Box<dyn Error + Send + Sync>),
Level(level::ParseError),
Other(Option<&'static str>),
}

// === impl DirectiveSet ===

impl<T> DirectiveSet<T> {
pub(crate) fn is_empty(&self) -> bool {
self.directives.is_empty()
}

pub(crate) fn iter(&self) -> std::slice::Iter<'_, T> {
self.directives.iter()
}
}

impl<T: Ord> Default for DirectiveSet<T> {
fn default() -> Self {
Self {
directives: FilterVec::new(),
max_level: LevelFilter::OFF,
}
}
}

impl<T: Match + Ord> DirectiveSet<T> {
pub(crate) fn directives(&self) -> impl Iterator<Item = &T> {
self.directives.iter()
}

pub(crate) fn directives_for<'a>(
&'a self,
metadata: &'a Metadata<'a>,
) -> impl Iterator<Item = &'a T> + 'a {
self.directives().filter(move |d| d.cares_about(metadata))
}

pub(crate) fn add(&mut self, directive: T) {
// does this directive enable a more verbose level than the current
// max? if so, update the max level.
let level = *directive.level();
if level > self.max_level {
self.max_level = level;
}
// insert the directive into the vec of directives, ordered by
// specificity (length of target + number of field filters). this
// ensures that, when finding a directive to match a span or event, we
// search the directive set in most specific first order.
match self.directives.binary_search(&directive) {
Ok(i) => self.directives[i] = directive,
Err(i) => self.directives.insert(i, directive),
}
}

#[cfg(test)]
pub(in crate::filter) fn into_vec(self) -> FilterVec<T> {
self.directives
}
}

impl<T: Match + Ord> FromIterator<T> for DirectiveSet<T> {
fn from_iter<I: IntoIterator<Item = T>>(iter: I) -> Self {
let mut this = Self::default();
this.extend(iter);
this
}
}

impl<T: Match + Ord> Extend<T> for DirectiveSet<T> {
fn extend<I: IntoIterator<Item = T>>(&mut self, iter: I) {
for directive in iter.into_iter() {
self.add(directive);
}
}
}

// === impl Statics ===

impl DirectiveSet<StaticDirective> {
pub(crate) fn enabled(&self, meta: &Metadata<'_>) -> bool {
let level = meta.level();
match self.directives_for(meta).next() {
Some(d) => d.level >= *level,
None => false,
}
}
}

// === impl StaticDirective ===

impl StaticDirective {
pub(in crate::filter) fn new(
target: Option<String>,
field_names: FilterVec<String>,
level: LevelFilter,
) -> Self {
Self {
target,
field_names,
level,
}
}
}

impl Ord for StaticDirective {
fn cmp(&self, other: &StaticDirective) -> Ordering {
// We attempt to order directives by how "specific" they are. This
// ensures that we try the most specific directives first when
// attempting to match a piece of metadata.

// First, we compare based on whether a target is specified, and the
// lengths of those targets if both have targets.
let ordering = self
.target
.as_ref()
.map(String::len)
.cmp(&other.target.as_ref().map(String::len))
// Then we compare how many field names are matched by each directive.
.then_with(|| self.field_names.len().cmp(&other.field_names.len()))
// Finally, we fall back to lexicographical ordering if the directives are
// equally specific. Although this is no longer semantically important,
// we need to define a total ordering to determine the directive's place
// in the BTreeMap.
.then_with(|| {
self.target
.cmp(&other.target)
.then_with(|| self.field_names[..].cmp(&other.field_names[..]))
})
.reverse();

#[cfg(debug_assertions)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

{
if ordering == Ordering::Equal {
debug_assert_eq!(
self.target, other.target,
"invariant violated: Ordering::Equal must imply a.target == b.target"
);
debug_assert_eq!(
self.field_names, other.field_names,
"invariant violated: Ordering::Equal must imply a.field_names == b.field_names"
);
}
}

ordering
}
}

impl PartialOrd for StaticDirective {
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
Some(self.cmp(other))
}
}

impl Match for StaticDirective {
fn cares_about(&self, meta: &Metadata<'_>) -> bool {
// Does this directive have a target filter, and does it match the
// metadata's target?
if let Some(ref target) = self.target {
if !meta.target().starts_with(&target[..]) {
return false;
}
}

if meta.is_event() && !self.field_names.is_empty() {
let fields = meta.fields();
for name in &self.field_names {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: if you write for x in &xs here, then you can if let Some(x) = &ox above instead of ref target. To me, using & for iteration and match ergonomics feel very close semantically, and it is a tiny bit odd to see one, but not the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I agree that this would be nicer. however, this code was already there; it was just moved around. I didn't modify the existing code that I moved in this PR except where it was strictly necessary for the change, for the sake of keeping the git history neater. i might come back and fix that up in a follow-up branch.

if fields.field(name).is_none() {
return false;
}
}
}

true
}

fn level(&self) -> &LevelFilter {
&self.level
}
}

impl Default for StaticDirective {
fn default() -> Self {
StaticDirective {
target: None,
field_names: FilterVec::new(),
level: LevelFilter::ERROR,
}
}
}

impl fmt::Display for StaticDirective {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let mut wrote_any = false;
if let Some(ref target) = self.target {
fmt::Display::fmt(target, f)?;
wrote_any = true;
}

if !self.field_names.is_empty() {
f.write_str("[")?;

let mut fields = self.field_names.iter();
if let Some(field) = fields.next() {
write!(f, "{{{}", field)?;
for field in fields {
write!(f, ",{}", field)?;
}
f.write_str("}")?;
}

f.write_str("]")?;
wrote_any = true;
}

if wrote_any {
f.write_str("=")?;
}

fmt::Display::fmt(&self.level, f)
}
}

impl FromStr for StaticDirective {
type Err = ParseError;
fn from_str(s: &str) -> Result<Self, Self::Err> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For from_str and other parsing routines, I find it very useful to add comments with examples of what is parsed:

// this parses
//   1 + 2

That’s a very efficient way to give a lot of context to the (usually somewhat tricky) code bellow. Examples usually work better than grammar, as you don’t need to decipher them,

let mut split = s.split('=');
hawkw marked this conversation as resolved.
Show resolved Hide resolved
let part0 = split
.next()
.ok_or_else(|| ParseError::msg("string must not be empty"))?;

if let Some(part1) = split.next() {
let mut split = part0.split("[{");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto for this other split

let target = split.next().map(String::from);
let mut field_names = FilterVec::new();
if let Some(maybe_fields) = split.next() {
let fields = maybe_fields
.strip_suffix("}]")
.ok_or_else(|| ParseError::msg("expected fields list to end with '}]'"))?;
field_names.extend(fields.split(',').filter_map(|s| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, is this .split(',') correct? There's a .split(',') which the Target does:

https://github.com/matklad/tracing/blob/33f030f3f6a12ff6543e5969eea3ca549ab4dc7c/tracing-subscriber/src/filter/targets.rs#L257-L259

So, the two would conflict I believe?

With my "stuck writing parsers" hat on, I sort-of feel that neither regexes, nor .split are the most appropriate parsing tools in this case. I personally would reach for "let's write recursive descent parser by hand" here, sort-of how sevmer does this: https://github.com/dtolnay/semver/blob/master/src/parse.rs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that as this is currently used, there will never be commas inside the directive, because this FromStr impl is only called in Targets::from_str, which already splits on commas. However, I wanted to handle directives with field lists in this FromStr impl, because I thought it would eventually be nice to allow the user to manually parse individual StaticDirectives. Also, if people want to use field name filtering with Targets, we could eventually rewrite the Targets parser to not just split on commas and use recursive descent or something, so that commas can be nested within filter directives.

I wanted to get something simple working now, though, and go back and add support for field name filtering later.

if s.is_empty() {
None
} else {
Some(String::from(s))
}
}));
};
let level = part1.parse()?;
return Ok(Self {
level,
field_names,
target,
});
}

// Okay, the part after the `=` was empty, the directive is either a
// bare level or a bare target.
Ok(match part0.parse::<LevelFilter>() {
Ok(level) => Self {
level,
..Default::default()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I wouldn’t default here, to make sure we get compilation error when we add new field to consider if we need to extend grammar.

},
Err(_) => Self {
target: Some(String::from(part0)),
level: LevelFilter::TRACE,
..Default::default()
},
})
}
}

// === impl ParseError ===

impl ParseError {
pub(crate) fn new() -> Self {
ParseError {
kind: ParseErrorKind::Other(None),
}
}

pub(crate) fn msg(s: &'static str) -> Self {
ParseError {
kind: ParseErrorKind::Other(Some(s)),
}
}
}

impl fmt::Display for ParseError {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self.kind {
ParseErrorKind::Other(None) => f.pad("invalid filter directive"),
ParseErrorKind::Other(Some(msg)) => write!(f, "invalid filter directive: {}", msg),
ParseErrorKind::Level(ref l) => l.fmt(f),
ParseErrorKind::Field(ref e) => write!(f, "invalid field filter: {}", e),
}
}
}

impl Error for ParseError {
fn description(&self) -> &str {
"invalid filter directive"
}

fn source(&self) -> Option<&(dyn Error + 'static)> {
match self.kind {
ParseErrorKind::Other(_) => None,
ParseErrorKind::Level(ref l) => Some(l),
ParseErrorKind::Field(ref n) => Some(n.as_ref()),
}
}
}

impl From<Box<dyn Error + Send + Sync>> for ParseError {
fn from(e: Box<dyn Error + Send + Sync>) -> Self {
Self {
kind: ParseErrorKind::Field(e),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This impl feels odd to me: it seems like it is only applicable to one, specific call-site, but it is very very general, as return type is a poor man’s anyhow basically.

Ohhhhh, I now see that I can formulate a code-smell here:

having From for BarError, which is used only once.

the single call-site means that we can convert from Foo to Bar in one specific context. But the impl means that we can do that in any context. I think, in such situations it’s better to do the conversion manually.

This is philosophical, but what is somewhat more well-grounded is that ParseError is a public type, so this would be a public impl which the user could use. And such an impl is a very odd one for a public API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, this is code that previously existed, but was moved and renamed. I believe that this impl actually exists because a user requested the ability to create custom ParseErrors for some kind of testing purpose.

}
}
}

impl From<level::ParseError> for ParseError {
fn from(l: level::ParseError) -> Self {
Self {
kind: ParseErrorKind::Level(l),
}
}
}
Loading