Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: resolves locally referenced package exports #90

Closed
wants to merge 2 commits into from

Conversation

jsumners-nr
Copy link
Contributor

This PR supersedes #88 and resolves #82.

@jsumners-nr
Copy link
Contributor Author

@trentm how does this look to you?

Copy link
Member

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Getting into the guts of RITM dealing with manipulating file paths always takes me a long time to get my bearings, which scares me away.


I think your test is only passing by fluke.

I added this local change to your branch to (a) add some debug prints and (b) reduce test/mapped-exports.js to just your test case:
https://gist.github.com/trentm/809410f18c1ef3b0e70f8d70621d480c

Then I ran the following:

$ node test/mapped-exports.js
TAP version 13
# handles mapped exports: picks up allow listed resolved module

XXX doWork section: attempt to map `require(id="../helpers/util")` in "/Users/trentm/el/require-in-the-middle3/node_modules/zod/lib/locales/en.js" (filename="/Users/trentm/el/require-in-the-middle3/node_modules/zod/lib/helpers/util.js", moduleName="zod", fullModuleName="zod/lib/helpers/util")

XXX doWork section: attempt to map `require(id="./helpers/util")` in "/Users/trentm/el/require-in-the-middle3/node_modules/zod/lib/ZodError.js" (filename="/Users/trentm/el/require-in-the-middle3/node_modules/zod/lib/helpers/util.js", moduleName="zod", fullModuleName="zod/lib/helpers/util")

...

Most attempts to map a given relative require('./something') don't result in a hit.
Then, getting to the require that we care about for this test:

...
XXX doWork section: attempt to map `require(id="./callbacks/manager.cjs")` in "/Users/trentm/el/require-in-the-middle3/node_modules/@langchain/core/dist/tools.cjs" (filename="/Users/trentm/el/require-in-the-middle3/node_modules/@langchain/core/dist/callbacks/manager.cjs", moduleName="@langchain/core", fullModuleName="@langchain/core/dist/callbacks/manager.cjs")
XXX matchFound: mapped filename="/Users/trentm/el/require-in-the-middle3/node_modules/@langchain/core/dist/callbacks/manager.cjs" to allowlisted moduleName="@langchain/core/callbacks/manager"
ok 1 hook name matches
XXX exports for @langchain/core/callbacks/manager: {
  parseCallbackConfigArg: [Function: parseCallbackConfigArg],
  BaseCallbackManager: [class BaseCallbackManager],
  CallbackManagerForRetrieverRun: [class CallbackManagerForRetrieverRun extends BaseRunManager],
  CallbackManagerForLLMRun: [class CallbackManagerForLLMRun extends BaseRunManager],
  CallbackManagerForChainRun: [class CallbackManagerForChainRun extends BaseRunManager],
  CallbackManagerForToolRun: [class CallbackManagerForToolRun extends BaseRunManager],
  CallbackManager: [class CallbackManager extends BaseCallbackManager],
  ensureHandler: [Function: ensureHandler],
  TraceGroup: [class TraceGroup],
  traceAsGroup: [AsyncFunction: traceAsGroup]
}
  • This is attempting to see if the require("./callbacks/manager.cjs") in ".../node_modules/@langchain/core/dist/tools.cjs" maps to a package export that is listed in the Hook args. In our case the Hook args are ['@langchain/core/callbacks/manager'].
  • That require id maps to the file: .../node_modules/@langchain/core/dist/callbacks/manager.cjs. Note that this is in the "dist/" dir.
  • Your change suggests that this path is a match for the @langchain/core/callbacks/manager export.

That is not correct. The @langchain/core/callbacks/manager export, for a "require", maps to the path ".../node_modules/@langchain/core/callbacks/manager.cjs". Note that this is the file NOT in the "dist/" dir. It is specific to @langchain/core that ".../node_modules/@langchain/core/callbacks/manager.cjs" happens to contain:

module.exports = require('../dist/callbacks/manager.cjs');

I think this may not be possible

Earlier in #88 (comment) I gave "implementation thoughts" that suggested what you want might be possible. I think I made a mistake there (the same "dist/" dir mistake). Let me try again.

  • Your goal is to be able to hook some part of the callbacks manager code in @langchain/core. For discussion, let's say it is the BaseCallbackManager class.
  • For CommonJS code that is implemented in ".../node_modules/@langchain/core/dist/callbacks/manager.cjs".
  • It should get hooked, no matter how the user indirectly causes that module to be loaded.
  • The @langchain/core/callbacks/manager export maps to ".../node_modules/@langchain/core/callbacks/manager.cjs".
  • Consider this call path. This loads the BaseCallbackManager implementation, but at no point is the filename for the @langchain/core/callbacks/manager export touched.
    • require('@langchain/core/tools') -> ".../node_modules/@langchain/core/tools.cjs"
    • require('./dist/tools.cjs') -> ".../node_modules/@langchain/core/dist/tools.cjs"
    • require("./callbacks/manager.cjs") -> ".../node_modules/@langchain/core/dist/callbacks/manager.cjs"

My conclusion is that there is no way to hook the BaseCallbackManager being loaded by using the @langchain/core/callbacks/manager argument to Hook().

@@ -42,3 +42,27 @@ test('handles mapped exports: mapped-exports/bar', { skip: nodeSupportsExports }

hook.unhook()
})

test('handles mapped exports: picks up allow listed resolved module', { skip: nodeSupportsExports }, function (t) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip needs to also skip out if node < 18, because that's the min node supported by @langchain/core being used for the test. That should get the other tests to run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue addressed.

@jsumners-nr
Copy link
Contributor Author

That is not correct. The @langchain/core/callbacks/manager export, for a "require", maps to the path ".../node_modules/@langchain/core/callbacks/manager.cjs". Note that this is the file NOT in the "dist/" dir. It is specific to @langchain/core that ".../node_modules/@langchain/core/callbacks/manager.cjs" happens to contain:

I don't know what else I can do. Yes, it is possible that (package.json).exports.whatever does not map to the same file as ../dist/whatever. But with (package.json).exports in place, you can only require things from that package which are listed in that exports block. If the module has implemented the same "export" in two different ways under different paths, then they should not do that.

  • Your goal is to be able to hook some part of the callbacks manager code in @langchain/core. For discussion, let's say it is the BaseCallbackManager class.
  • For CommonJS code that is implemented in ".../node_modules/@langchain/core/dist/callbacks/manager.cjs".
  • It should get hooked, no matter how the user indirectly causes that module to be loaded.

That, and the rest of the summary, is correct.

My conclusion is that there is no way to hook the BaseCallbackManager being loaded by using the @langchain/core/callbacks/manager argument to Hook().

Here's what I know:

  1. Our fork (that we don't want) works. It is based on Handle mapped exports #82, but does not include your fix (as the fork was released prior to your review).
  2. This PR, chore: resolves locally referenced package exports #90, also solves the issue for the offending package. It passes every time for me locally. I'd be up for solving another test case if you can provide one that exhibits the behavior you are describing.

@trentm
Copy link
Member

trentm commented Jul 18, 2024

If the module has implemented the same "export" in two different ways under different paths, then they should not do that.

If I understand you correctly, @langchain/core is effectively doing this. The "callbacks manager" code is (a) accessible via the @langchain/core/callbacks/manager export, and (b) is accessed internally via relative imports of its implementation file in ".../dist/callbacks/manager.cjs". Whether they should do this or not is a matter of opinion, I guess. When the langchain "tools" code accesses the callbacks manager code it doesn't get there via the top-level "export". Therefore, for this case, having RITM resolve local references to a possibly matching package "export" won't ever help.

I don't know what else I can do.

So, if one wants to hook the langchain "callbacks manager" code, regardless of what code path loads it, one needs to give RITM a hook arg that references a module that all those code paths go through. That module, in this case, is ".../node_modules/@langchain/core/dist/callbacks/manager.cjs".

As I argued at #88 (comment) the RITM hook arg for this should include the .cjs extension. So I'm suggesting:

Hook(['@langchain/core/dist/callbacks/manager.cjs'], ...)

which works now, IIRC.
I have this PR to document (and better test) that support: #89

On that PR you had this comment (repeated here):

I understand the reasoning, but it would make instrumenting modules very complicated. The end user API of require-in-the-middle is that the export of whatever targeted module is provided to the hook function. The (package.json).exports map complicates this as it can resolve the same export name to multiple extensions. In my view, the hook function shouldn't need to know about that detail. It should just receive the right thing.

I think we got partially distracted by package "exports" for langchain. @langchain/core/dist/callbacks/manager, with the "dist/" isn't one of its exports, and the @langchain/core/callbacks/manager package export is not an import path that the @langchain/core/tools code path ever imports to get to the "callbacks manager" code.


I'm guessing a bit here, but perhaps you are referring to wanting to have the same Hook arg to both RITM and IITM to be able to hook the langchain "callbacks manager" code? If so, I don't think this will be possible. The @langchain/core ESM and CommonJS code are completely separate modules and, as shown above, there is no package export name that refers to the "callbacks manager" implementation code. The callbacks manager ESM code is .../@langchain/core/dist/callbacks/manager.js and the callbacks manager CJS code is .../@langchain/core/dist/callbacks/manager.cjs.

In general, there is no requirement that the ESM code and CJS code for this module have a similar path. In many packages that ship both ESM and CJS code, they are in separate directory trees. So wanting a single @langchain/core/dist/callbacks/manager name (without file extension) as a Hook arg to refer to both the ESM and CJS modules doesn't work.

So, I think for CJS you'll want '@langchain/core/dist/callbacks/manager.cjs':

var {Hook} = require('require-in-the-middle');
new Hook(['@langchain/core/dist/callbacks/manager.cjs'], (mod, name, baseDir) => {
    console.log('Hooked name=%j:', name, mod);
    return mod;
})
var tools = require('@langchain/core/tools')
console.log('Required @langchain/core/tools:', tools);

and for ESM, this seems to work for me:

use-langchain-tools.mjs

import * as tools from '@langchain/core/tools';
console.log('Imported @langchain/core/tools:', tools);

hook-setup.mjs

import {Hook} from 'import-in-the-middle';
new Hook(['@langchain/core'], {internals: true}, (mod, name, baseDir) => {
    if (name === '@langchain/core/dist/callbacks/manager.js') {
        console.log('Hooked name=%j: mod keys are:', name, Object.keys(mod));
    }
    return mod;
});

Using that:

% node --disable-warning=ExperimentalWarning --loader=import-in-the-middle/hook.mjs --import=./hook-setup.mjs use-langchain-tools.mjs
Hooked name="@langchain/core/dist/callbacks/manager.js": mod keys are: [
  'BaseCallbackManager',
  'BaseRunManager',
  'CallbackManager',
  'CallbackManagerForChainRun',
  'CallbackManagerForLLMRun',
  'CallbackManagerForRetrieverRun',
  'CallbackManagerForToolRun',
  'TraceGroup',
  'ensureHandler',
  'parseCallbackConfigArg',
  'traceAsGroup'
]
Imported @langchain/core/tools: [Module: null prototype] {
  BaseToolkit: [class BaseToolkit],
  DynamicStructuredTool: [class DynamicStructuredTool extends StructuredTool],
  DynamicTool: [class DynamicTool extends Tool],
  StructuredTool: [class StructuredTool extends BaseLangChain],
  Tool: [class Tool extends StructuredTool],
  ToolInputParsingException: [class ToolInputParsingException extends Error],
  tool: [Function: tool]
}

possible IITM feature

AFAICT, IITM doesn't support passing a module sub-path as a hook arg. That could be a feature, so that instead of requiring (a) {internals: true} and (b) the if (name === ...) check in the Hook callback one could eventually have:

new Hook(['@langchain/core/dist/callbacks/manager.js'], (mod, name, baseDir) => {
    console.log('Hooked name=%j: mod keys are:', name, Object.keys(mod));
    return mod;
});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants