do durational terms inside (any) operators produce reasonable summary stats? #42

chad-klumb · 2021-06-13T21:29:47Z

If not, this may be a problem for #19, because the targets formula may, in general, have durational terms (e.g. mean.age).

To give a trivial example,

require(tergm)
nw <- network.initialize(100, dir = F)
nw <- simulate(nw ~ Form(~edges) + Diss(~edges), coef = c(-5, 4), dynamic = TRUE, time.slices = 100, output = "final")

summary(nw ~ mean.age)
summary(nw ~ Form(~mean.age))
summary(nw ~ Diss(~mean.age))
summary(nw ~ Passthrough(~mean.age))
summary(nw ~ Sum(~mean.age, label = I))

produces

> summary(nw ~ mean.age)
mean.age 
37.61614 
> summary(nw ~ Form(~mean.age))
Form~mean.age 
            1 
> summary(nw ~ Diss(~mean.age))
Diss~mean.age 
            1 
> summary(nw ~ Passthrough(~mean.age))
Passthrough~mean.age 
            19458257 
> summary(nw ~ Sum(~mean.age, label = I))
mean.age 
19458257

The first value is presumably correct. The others, not so much. I am surprised that basically trivial cross-sectional operators don't work here (#19 would be a similarly trivial operator, not anything with its own complicated durational content).

The text was updated successfully, but these errors were encountered:

chad-klumb · 2021-06-13T22:15:41Z

I tried edges.ageinterval(13) instead of mean.age and it is likewise inconsistent across the 5 cases above.

On the other hand, replacing mean.age with either Form(~edges) or Diss(~edges) produces the same value in all 5 cases. This seems correct for Passthrough and Sum, but is not what I would have expected for ~Form(~Form(~edges)) or ~Diss(~Form(~edges)), which (I would have expected) should somehow have 2 step memory.

require(tergm)
nw <- network.initialize(100, dir = F)
nw <- simulate(nw ~ Form(~edges) + Diss(~edges), coef = c(-5, 4), dynamic = TRUE, time.slices = 100, output = "final")

summary(nw ~ mean.age)
summary(nw ~ Form(~mean.age))
summary(nw ~ Diss(~mean.age))
summary(nw ~ Passthrough(~mean.age))
summary(nw ~ Sum(~mean.age, label = I))

summary(nw ~ Form(~edges))
summary(nw ~ Form(~Form(~edges)))
summary(nw ~ Diss(~Form(~edges)))
summary(nw ~ Passthrough(~Form(~edges)))
summary(nw ~ Sum(~Form(~edges), label = I))

summary(nw ~ Diss(~edges))
summary(nw ~ Form(~Diss(~edges)))
summary(nw ~ Diss(~Diss(~edges)))
summary(nw ~ Passthrough(~Diss(~edges)))
summary(nw ~ Sum(~Diss(~edges), label = I))

summary(nw ~ edges.ageinterval(13))
summary(nw ~ Form(~edges.ageinterval(13)))
summary(nw ~ Diss(~edges.ageinterval(13)))
summary(nw ~ Passthrough(~edges.ageinterval(13)))
summary(nw ~ Sum(~edges.ageinterval(13), label = I))

with results

> summary(nw ~ mean.age)
mean.age 
 36.3216 
> summary(nw ~ Form(~mean.age))
Form~mean.age 
            1 
> summary(nw ~ Diss(~mean.age))
Diss~mean.age 
            1 
> summary(nw ~ Passthrough(~mean.age))
Passthrough~mean.age 
            27238953 
> summary(nw ~ Sum(~mean.age, label = I))
mean.age 
27238953 
> 
> summary(nw ~ Form(~edges))
Form~edges 
      1237 
> summary(nw ~ Form(~Form(~edges)))
Form~Form~edges 
           1237 
> summary(nw ~ Diss(~Form(~edges)))
Diss~Form~edges 
           1237 
> summary(nw ~ Passthrough(~Form(~edges)))
Passthrough~Form~edges 
                  1237 
> summary(nw ~ Sum(~Form(~edges), label = I))
Form~edges 
      1237 
> 
> summary(nw ~ Diss(~edges))
Diss~edges 
      1191 
> summary(nw ~ Form(~Diss(~edges)))
Form~Diss~edges 
           1191 
> summary(nw ~ Diss(~Diss(~edges)))
Diss~Diss~edges 
           1191 
> summary(nw ~ Passthrough(~Diss(~edges)))
Passthrough~Diss~edges 
                  1191 
> summary(nw ~ Sum(~Diss(~edges), label = I))
Diss~edges 
      1191 
> 
> summary(nw ~ edges.ageinterval(13))
edges.age13toInf 
             934 
> summary(nw ~ Form(~edges.ageinterval(13)))
Form~edges.age13toInf 
                    0 
> summary(nw ~ Diss(~edges.ageinterval(13)))
Diss~edges.age13toInf 
                    0 
> summary(nw ~ Passthrough(~edges.ageinterval(13)))
Passthrough~edges.age13toInf 
                          31 
> summary(nw ~ Sum(~edges.ageinterval(13), label = I))
edges.age13toInf 
              31

Anyway, if we could get durational terms (primarily non-operators like mean.age, although operators are, I think, conceivable use cases) working inside non-durational operators, that would probably be enough for #19.

chad-klumb · 2021-06-13T22:40:10Z

Maybe related to use of S_FNs for durational terms like mean.age and edges.ageinterval?

Can these S_FNs assume that their lasttoggle auxiliaries have been initialized? The code is currently written that way, and well tested outside of operators, but maybe something changes when they're inside of operators and this assumption is no longer valid?

By the way, what are Z and W functions? Are those documented somewhere?

chad-klumb · 2021-06-13T22:59:02Z

Okay, might be due to calls to the C_FN outside of ticktock that are not currently handled. I'll see if I can add support for that.

chad-klumb · 2021-06-13T23:18:31Z

With d532a3b results look more reasonable:

> summary(nw ~ mean.age)
mean.age 
36.60097 
> summary(nw ~ Form(~mean.age))
Form~mean.age 
     35.89628 
> summary(nw ~ Persist(~mean.age))
Persist~mean.age 
        37.45492 
> summary(nw ~ Passthrough(~mean.age))
Passthrough~mean.age 
            36.60097 
> summary(nw ~ Sum(~mean.age, label = I))
mean.age 
36.60097

krivit · 2021-06-14T03:57:07Z

What durational terms' statistics mean inside operators that modify the network is an open question, and it's sometimes not 100% clear what they should be because they may depend on what happened before the last toggle.

w_ and z_ functions are mentioned in the terms API vignettes (though it looks like I never properly documented them), but the first one is about writing the updated lasttoggle back into the ergm_state and the latter z_ is about calculating empty-network statistics taking into account the extended state, such as when Form(~edges) starts with the count of the edges in the previous time step.

krivit · 2021-06-14T04:03:32Z

For the purposes of the release, we may have to prioritise:

Some terms, such as edges.ageinterval() need to work well within Diss() and other operators and produce sensible statistics outside of them, when used as targets.
Other terms, such as mean.age are typically used as targets, and we don't care that much how well they work inside operators.

Also, I'm pretty sure I've rigged things so that if you try to initialise a term that requires lasttoggle inside an operator that doesn't pass it through, it triggers a sensible error.

martinamorris · 2021-06-17T23:53:41Z

@krivit, @chad-klumb i think this issue needs to be resolved before release. either these durational terms can be used in operators, or they can't (for now). the docs need to state what is currently true.

krivit · 2021-06-18T08:35:12Z

@martinamorris , simply put, some combinations have ambiguous interpretations, that we need to think a lot more about. The priority as I see it is to make sure that the change statistics work correctly inside operator terms, which as far as I can tell they do, and that summary statistics work well outside of dynamic operators (e.g., Form()) and inside supported non-dynamic operators such as Sum(). With that, we can use them in EGMME.

Before 4.0, some terms would outright refuse to be used in a summary or target stat formula, and others in the formation and dissolution formulas. What we have here is therefore already a uniform gain in functionality.

chad-klumb · 2021-06-18T17:27:42Z

I don't think the behavior is correct in general, either for changestats or summary stats. The issue is not limited to durational terms inside durational operators, but would also occur for durational terms inside some types of cross-sectional operators.

One basic problem is that propagating the top-level lasttoggle information on initialization and having terms inside operators take that at face value is fundamentally incorrect, at least for any reasonable interpretation of durational terms inside operators (including purely cross-sectional operators) that I can think of.

To make progress on this would require deciding what durational terms inside operators should mean. In cases where the operator evaluates its submodel on some transformation of the network (possibly depending on network history in the durational operator case), I think the behavior of durational terms should match that of cross-sectional terms, in that everything is defined with respect to this transformed network. For example, Form(~mean.age) should compute the mean age of ties in the union network with respect to the lasttoggle history of the union network, and Persist(~mean.age) should compute the mean age of ties in the intersection network with respect to the lasttoggle history of the intersection network. By initializing those histories to that of the top-level network, we invalidate both summary stats and change stats, in general.

A similar problem can arise with purely cross-sectional operators. Suppose we have a network with two node types, A and B, occurring in random order. Define a cross-sectional operator X which retains only the A nodes and A-A ties, dropping the B nodes and any ties incident on one or more Bs. Assume also that X re-indexes the retained A nodes with an initial segment of the positive integers. Then lasttoggle information for durational terms inside X will be initialized incorrectly, because nodal indices in the top-level network do not match up to nodal indices in X's transformed network.

You can easily come up with cross-sectional examples where the nodes are the same but the edges are (in general) different, again showing that the current initialization is problematic.

To fix this problem in general would, I think, require at least as big a change as suggested in statnet/ergm#325 (comment). There may also be operators that the above discussion does not neatly apply to, which would only complicate the matter further.

Overall, I think fixing this now is not possible. The full scope of the problem should be carefully considered, and when a good solution is available, we can work on implementing it, but that will not be a 4.0 thing.

martinamorris · 2021-06-18T18:08:27Z

thanks for the detailed response @chad-klumb. i agree with you, and i think this is actually (another) paper.

so, the question remains:

do we want to remove the ability to use durational terms in operators for 4.0
or do we keep them, but make sure the docs say "we don't guarantee these work, and it's not clear how to interpret them"
or do we keep them, but make the docs say they can't be used at this point.

i'm in favor of 3.

chad-klumb · 2021-06-18T18:32:39Z

I think if they're made available, people will use them, even if we tell them not to. For that reason I'd prefer 1., but any option is fine with me, as long as we're clear about them not working in general.

martinamorris · 2021-06-18T18:49:03Z

How hard would it be to implement (1)?

chad-klumb · 2021-06-18T18:55:44Z

Actually, 1. would be problematic for removing offsets in EGMME (now done via operator terms) if a durational target is also included, so let's go with one of the other options :)

krivit · 2021-06-19T06:12:39Z

@chad-klumb , thanks for the detailed write-up. This is exactly what I mean when I write above that it's not always clear what durational statistics mean in the operators.

This is also why I only implemented extended state and signal propagation for a handful of operator terms in ergm (Sum, Offset, Passthrough, *, and :, IIRC): the network that these terms present to their submodels is an exact copy of the network passed to the operator, and they only manipulate the resulting change statistics. Trying to do it inside others will hopefully halt with an error.

As you write, operators that manipulate network nodes will require some kind of a remapping API and a operators that manipulate edges require precisely defining what is wanted.

This will have to wait until later.

martinamorris · 2021-06-21T17:18:53Z

What can't wait is the documentation.

What do we want that to say?

thanks for the detailed response @chad-klumb. i agree with you, and i think this is actually (another) paper.

so, the question remains:

do we want to remove the ability to use durational terms in operators for 4.0

or do we keep them, but make sure the docs say "we don't guarantee these work, and it's not clear how to interpret them"

or do we keep them, but make the docs say they can't be used at this point.

i'm in favor of 3.

chad-klumb · 2021-06-21T19:57:44Z

I would say it's not supported and produces undefined behavior.

krivit · 2021-06-22T01:09:10Z

Agreed with @chad-klumb; in particular, we should warn the user not to nest temporal operators.

We probably also want to add some tests (e.g., Form() inside S() (Subgraph)) and make sure it actually errs rather than producing incorrect results.

martinamorris · 2021-06-22T01:52:52Z

Agreed with @chad-klumb; in particular, we should warn the user not to nest temporal operators.

By "nest temporal operators" are you referring to this:

Form(~mean.age)

krivit · 2021-06-22T02:05:41Z

No, more like Cross(~Form(~edges)) and similar silliness, but Form(~mean.age) probably won't work very well either. (I did check that Diss(~edges.ageinterval(2,4)) worked, at lest for simulation.)

martinamorris · 2021-06-22T03:04:23Z

So I'm going to say that the use of durational terms during estimation is not currently supported but they can be used as targets, monitors, etc.. Is that correct?

martinamorris · 2021-06-22T03:15:42Z

What's confusing me is that the stub currently says:

As currently implemented, the package does not support use of durational 
  terms during estimation (i.e., in the formation
  or dissolution models -- whether specified separably with \code{formation} and 
  \code{dissolution} arguments in the old style, or inside one of the new style temporal
  operator terms (\code{Form}, \code{Persist}, \code{Diss}, \code{Cross} or 
  \code{Change}).  But the durational terms may be used as targets, monitors,
  or summary statistics, and may also appear as non-separable effects 
  in the generative model (necessarily specified in the new style).

That reflects some partial editing. So I can't tell if these terms are

not supported for any Operator term estimation
supported in non-separable models, but not in separable models.

krivit · 2021-06-22T04:10:19Z

I think some of them might work at this point, but the capability is largely untested.

martinamorris · 2021-06-22T17:08:39Z

Ok, for now I'm going with this:

  As currently implemented, the package does not support use of durational 
  terms during estimation.  But the durational terms may be used as targets, monitors,
  or summary statistics.  The ability to 
  use these terms in the estimation of models is under development.

krivit · 2021-06-22T23:31:15Z

I think that's a bit misleading, since they are fine during estimation; it's just not always clear what they mean when inside operators.

martinamorris · 2021-06-23T02:44:30Z

If we can't say what they mean -- i.e., how to interpret the coefficients, I would say they are "unsupported". I'm imagining someone writing to statnet_help after fitting a model with these terms inside an operator, and asking us to explain how to interpret the results. If we can't do that, it's embarrassing.

This should just push us to understand, and incorporate in our materials, the interpretation of these terms.

krivit · 2021-06-23T03:23:42Z

The way I see it, "does not support" doesn't have the same connotations as "is unsupported".

martinamorris · 2021-06-23T04:01:08Z

ok then. for now, it's not a distinction we need to parse.

chad-klumb added a commit that referenced this issue Jun 13, 2021

initial tweak to address #42

d532a3b

martinamorris mentioned this issue Jun 21, 2021

Updating dissolution terminology in the .Rd files #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do durational terms inside (any) operators produce reasonable summary stats? #42

do durational terms inside (any) operators produce reasonable summary stats? #42

chad-klumb commented Jun 13, 2021

chad-klumb commented Jun 13, 2021

chad-klumb commented Jun 13, 2021

chad-klumb commented Jun 13, 2021

chad-klumb commented Jun 13, 2021

krivit commented Jun 14, 2021

krivit commented Jun 14, 2021 •

edited

Loading

martinamorris commented Jun 17, 2021

krivit commented Jun 18, 2021

chad-klumb commented Jun 18, 2021

martinamorris commented Jun 18, 2021

chad-klumb commented Jun 18, 2021

martinamorris commented Jun 18, 2021

chad-klumb commented Jun 18, 2021

krivit commented Jun 19, 2021

martinamorris commented Jun 21, 2021

chad-klumb commented Jun 21, 2021

krivit commented Jun 22, 2021

martinamorris commented Jun 22, 2021

krivit commented Jun 22, 2021

martinamorris commented Jun 22, 2021

martinamorris commented Jun 22, 2021 •

edited

Loading

krivit commented Jun 22, 2021

martinamorris commented Jun 22, 2021

krivit commented Jun 22, 2021

martinamorris commented Jun 23, 2021 •

edited

Loading

krivit commented Jun 23, 2021

martinamorris commented Jun 23, 2021

do durational terms inside (any) operators produce reasonable summary stats? #42

do durational terms inside (any) operators produce reasonable summary stats? #42

Comments

chad-klumb commented Jun 13, 2021

chad-klumb commented Jun 13, 2021

chad-klumb commented Jun 13, 2021

chad-klumb commented Jun 13, 2021

chad-klumb commented Jun 13, 2021

krivit commented Jun 14, 2021

krivit commented Jun 14, 2021 • edited Loading

martinamorris commented Jun 17, 2021

krivit commented Jun 18, 2021

chad-klumb commented Jun 18, 2021

martinamorris commented Jun 18, 2021

chad-klumb commented Jun 18, 2021

martinamorris commented Jun 18, 2021

chad-klumb commented Jun 18, 2021

krivit commented Jun 19, 2021

martinamorris commented Jun 21, 2021

chad-klumb commented Jun 21, 2021

krivit commented Jun 22, 2021

martinamorris commented Jun 22, 2021

krivit commented Jun 22, 2021

martinamorris commented Jun 22, 2021

martinamorris commented Jun 22, 2021 • edited Loading

krivit commented Jun 22, 2021

martinamorris commented Jun 22, 2021

krivit commented Jun 22, 2021

martinamorris commented Jun 23, 2021 • edited Loading

krivit commented Jun 23, 2021

martinamorris commented Jun 23, 2021

krivit commented Jun 14, 2021 •

edited

Loading

martinamorris commented Jun 22, 2021 •

edited

Loading

martinamorris commented Jun 23, 2021 •

edited

Loading