Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes allow listing mapped modules that resolve with an extension #88

Closed
wants to merge 1 commit into from

Conversation

jsumners-nr
Copy link
Contributor

PR #82 originally included code to ignore the resolved file extension. This PR restores that functionality.

PR elastic#82 originally included code to ignore the resolved file
extension. This PR restores that functionality.
@jsumners-nr
Copy link
Contributor Author

@trentm do you have time to take a look at this one?

@trentm
Copy link
Member

trentm commented May 6, 2024

@jsumners-nr Again sorry for the delay in looking at this.

I've gone back and forth on this a couple of times, and now I think the right answer here is to (a) NOT take this patch and (b) you should use this in your code (i.e. specify the .cjs extension in the hook arg):

const hook = new Hook(['@langchain/core/dist/callbacks/manager.cjs'], function (exports, name) {
    // ...
});

Let me explain.


In an earlier look at this, I thought that this should work just like the existing "sub-module/foo" test:

test('require(\'sub-module/foo\') => sub-module/foo.js', function (t) {
t.plan(3)
const hook = new Hook(['sub-module/foo'], function (exports, name, basedir) {
t.equal(name, 'sub-module/foo')
return exports
})
t.on('end', function () {
hook.unhook()
})
t.equal(require('./node_modules/sub-module'), 'sub-module/index.js')
t.equal(require('./node_modules/sub-module/foo'), 'sub-module/foo.js')
})

This is what your patch would accomplish. It could also be accomplished with this cleaner patch to the handling that is already normalizing a require.resolved path by removing the .js extension:

diff --git a/index.js b/index.js
index 966b9c9..1b0cf68 100644
--- a/index.js
+++ b/index.js
@@ -47,7 +47,7 @@ if (Module.isBuiltin) { // Added in node v18.6.0, v16.17.0
 }

 // 'foo/bar.js' or 'foo/bar/index.js' => 'foo/bar'
-const normalize = /([/\\]index)?(\.js)?$/
+const normalize = /([/\\]index)?(\.(cjs|js))?$/

 // Cache `onrequire`-patched exports for modules.
 //

This normalization is effectively saying that a resolved load of ".../some-package/some-sub-path/foo.js" should be hookable via a Hook arg of some-package/some-sub-path/foo (i.e. without the extension). Likewise that a resolved load of ".../some-package/some-sub-path/bar/index.js" should be hookable via some-package/some-sub-path/bar.

The only difference with your case (with @langchain/core) is that the file extension is .cjs, right?

However Node's require() doesn't treat .cjs the same way it does .js. https://nodejs.org/api/modules.html#file-modules says:

If the exact filename is not found, then Node.js will attempt to load the required filename with the added extensions: .js, .json, and finally .node. When loading a file that has a different extension (e.g. .cjs), its full name must be passed to require(), including its file extension (e.g. require('./file.cjs')).

I think RITM should follow the same pattern and NOT normalize-away .cjs.

tl;dr

I think the correct RITM usage in this case is to specify the .cjs extension:

const hook = new Hook(['@langchain/core/dist/callbacks/manager.cjs'], function (exports, name) {
    // ...
});

Is there a reason you aren't able to do this in your code?


This should be tested and documented. I'll try to get a PR to test and doc this soon.

@trentm
Copy link
Member

trentm commented May 7, 2024

@jsumners-nr #89 is my proposed test and doc update to clarify how to hook package-internal .cjs files.

@jsumners-nr
Copy link
Contributor Author

I don't think that is going to work for us. We don't have separate instrumentations for modules loaded via require vs import. We rely on require-in-the-middle and import-in-the-middle to provide the module export for instrumentation. The issue is that folks like langchain are supplying separate CJS and ESM variants in one package via the exports map. All together, this means we supply a hook for the export name and expect to get back whatever that maps to.

@jsumners-nr
Copy link
Contributor Author

@trentm have you had a chance to think about this?

@trentm
Copy link
Member

trentm commented May 21, 2024

@jsumners-nr Can we get more concrete with the hooking code that you are trying to get to work? I think I'm missing something.

The IITM Hook modules arg doesn't support anything but the top-level module name, e.g. @langchain/core, IIUC. So, unless I'm missing something, there is no way to get IitmHook(['@langchain/core/dist/callbacks/manager'], ...) to work.

There is support for an internals: true option as well, but I don't think that's what you are referring to.

Do you have some other in-progress PR to IITM to support something like that?

Also, I wonder if one potential point of confusion between us here is referring to an entry-point to the @langchain/core module. It has this in its package.json#exports:

    "./callbacks/manager": {
      "types": {
        "import": "./callbacks/manager.d.ts",
        "require": "./callbacks/manager.d.cts",
        "default": "./callbacks/manager.d.ts"
      },
      "import": "./callbacks/manager.js",
      "require": "./callbacks/manager.cjs"
    },

That means @langchain/core/callbacks/manager is a hook path that should work with RITM.
However, in your example you are using @langchain/core/dist/callbacks/manager, which is NOT an entry-point defined by the module.

@jsumners-nr
Copy link
Contributor Author

I'm doing some more research based on the information you have provided. At the moment, I have only been able to refactor the test to replicate the issue I am actually trying to solve:

test('handles mapped exports: picks up allow listed resolved module', { skip: nodeSupportsExports }, function (t) {
  t.plan(3)

  let hookHit = false
  const hook = new Hook(['@langchain/core/callbacks/manager'], function (exports) {
    hookHit = true
    return exports
  })

  const { Tool } = require('@langchain/core/tools')
  const MyTool = class MyTool extends Tool {
    _call () {
      t.pass()
    }
  }

  const tool = new MyTool()
  tool.call('foo', [() => { t.pass() }])
  t.equal(hookHit, true)

  hook.unhook()
})

In short:

  1. We try to hook @langchain/core/callbacks/manager (a listed export)
  2. We require @langchain/core/tools (a listed export)
  3. We expect our hook to be hit because the tools import also imports the callback manager script (https://github.com/langchain-ai/langchainjs/blob/48398048de3ff4502b3434f607de338edfef8d10/langchain-core/src/tools.ts#L2-L7)

What we will see is that the tools script is ultimately trying to import @langchain/core/dist/callbacks/manager.cjs. This is shown in the debugger screenshot below.

The "fix" in this PR does not solve this. But before I go deep into solving it, what are your thoughts around this @trentm?

Debugger Screenshot

image

@trentm
Copy link
Member

trentm commented May 24, 2024

why it currently doesn't work for your case

Tracing the chain of loaded modules:

require('@langchain/core/tools')
// resolves via package.json#exports to node_modules/@langchain/core/tools.cjs, which does:
module.exports = require('./dist/tools.cjs');
// resolves to node_modules/@langchain/core/dist/tools.cjs, which does:
const manager_js_1 = require("./callbacks/manager.cjs");
// resolves to node_modules/@langchain/core/dist/callbacks/manager.cjs

DEBUG=require-in-the-middle debug output for that last "resolves to" (with newlines for some readability):

require-in-the-middle resolved filename "/Users/trentm/tmp/asdf.20240502T113435/node_modules/@langchain/core/dist/callbacks/manager.cjs" to
   module: @langchain/core (
    id: ./callbacks/manager.cjs,
    resolved: @langchain/core/dist/callbacks/manager.cjs,
    basedir: /Users/trentm/tmp/asdf.20240502T113435/node_modules/@langchain/core) +0ms

So, with the current RITM code, you need one of the following Hook args to match:

  1. absolute path "/Users/trentm/tmp/asdf.20240502T113435/node_modules/@langchain/core/dist/callbacks/manager.cjs", obviously not relevant for your case,
  2. matching that "resolved" (internal var is fullModuleName): @langchain/core/dist/callbacks/manager.cjs

RITM's matching of an entry point is only looking at the string directly passed to the require(...) function, the id var.
So, entry-point matching only currently works for this directly:

require('@langchain/core/callbacks/manager')

and not with whatever internal require() call that happens to resolve to the same file path that some entry point refers to, e.g. require('./callbacks/manager.cjs'); from .../node_modules/@langchain/core/tools.cjs in this example.

implementation thoughts

Thinking about how RITM might implement what you are looking for.

  • Say, RITM gets a Hook arg @foo/bar/baz.
  • If id is a direct match, i.e. require('@foo/bar/baz'), then call the hook.
  • Otherwise, lazily, when there is a first require(...) that resolves to some file path under package @foo/bar, RITM would call require.resolve('@foo/bar/baz') to see if it resolves to some path, watching out for exceptions:
> require.resolve('@langchain/core/callbacks/manager')
'/Users/trentm/tmp/asdf.20240502T113435/node_modules/@langchain/core/callbacks/manager.cjs'
> require.resolve('@langchain/core/bogus')
Uncaught:
Error [ERR_PACKAGE_PATH_NOT_EXPORTED]: Package subpath './bogus' is not defined by "exports" in /Users/trentm/tmp/asdf.20240502T113435/node_modules/@langchain/core/package.json
...
  • It would do this for every @foo/bar/* Hook arg it received.
  • Now, whenever there is a require(...) that resolves to a path inside @foo/bar, RITM checks to see if the full filename is one of those resolved paths, and if so, then call the hook (i.e. this is a new match).

I think this could work.

One subtle change in behaviour, that likely doesn't practically matter, is that this case (where the Hook arg includes both the entry-point and the package sub-module path) will match with with a different name value:

Hook(['@langchain/core/callbacks/manager', '@langchain/core/dist/callbacks/manager.cjs'], (name) => {
    console.log('Hook: name=%s', name)
})

I can't imagine that'll break anyone. If we're very conservative, that would be a major version bump.

@jsumners-nr
Copy link
Contributor Author

👋 @trentm checking in to see if you've had a chance to consider my finding.

@trentm
Copy link
Member

trentm commented Jun 5, 2024

checking in to see if you've had a chance to consider my finding.

@jsumners-nr Did you see my previous comment?

@jsumners-nr
Copy link
Contributor Author

checking in to see if you've had a chance to consider my finding.

@jsumners-nr Did you see my previous comment?

🤦 no. I failed to refresh the page before commenting and for some reason I never got a GitHub notification of an update.


Your idea sounds like a great solution. Would you like me to work on it?

@trentm
Copy link
Member

trentm commented Jun 7, 2024

Would you like me to work on it?

If you have the bandwidth, that would be great. It feels like I won't be able to get to this soon.

@jsumners-nr
Copy link
Contributor Author

Would you like me to work on it?

If you have the bandwidth, that would be great. It feels like I won't be able to get to this soon.

Thanks. I'll get it into my queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants