Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web version of the DataFrame viewer #10825

Merged
merged 28 commits into from
Jul 19, 2022
Merged

Conversation

sadasant
Copy link
Contributor

@sadasant sadasant commented Jul 15, 2022

(The prior take on solving #9665 was PR: #10604)

Last month we tried to get the DataFrame viewer working on the web. In summary, we arrived at a set of bigger goals that are available here: #10638 . The first step actually does solve #9665 , but with an imperfect approach that is nonetheless good enough to get this feature out and get us started towards our longer term goals.

The first step, solved in this PR, goes as follows:

Enable the Data Frame on the web (Fixing #9665)

  • By changing the IDataViewerDependencyService to receive an IKernel besides the PythonEnvironment that it receives today.
  • By adding a universal version of the dataViewerDependencyService that would only work if IKernel is provided and throw otherwise an error saying Installation of "pandas" not supported while debugging. Please ensure you have "pandas" installed.
  • By keeping the current node-specific version of the dataViewerDependencyService that would extract the interpreter from the IKernel if the IKernel is provided.
  • Besides ensuring the dataViewerDependencyService is registered on the web service registry, and removing the conditional that disables the Data Frame on the web.

This PR contains those changes, and includes unit tests.

Fixes #9665

Feedback always appreciated!

@sadasant sadasant self-assigned this Jul 15, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jul 15, 2022

Codecov Report

Merging #10825 (3536b30) into main (d37e4a9) will increase coverage by 0%.
The diff coverage is 85%.

❗ Current head 3536b30 differs from pull request most recent head c3341d7. Consider uploading reports for the commit c3341d7 to get more accurate results

@@           Coverage Diff           @@
##            main   #10825    +/-   ##
=======================================
  Coverage     63%      63%            
=======================================
  Files        482      486     +4     
  Lines      33652    33736    +84     
  Branches    5488     5499    +11     
=======================================
+ Hits       21295    21413   +118     
+ Misses     10304    10274    -30     
+ Partials    2053     2049     -4     
Impacted Files Coverage Δ
src/platform/common/utils.node.ts 58% <ø> (-4%) ⬇️
src/telemetry.ts 100% <ø> (ø)
src/webviews/extension-side/dataviewer/types.ts 100% <ø> (ø)
src/platform/common/utils/localize.ts 76% <66%> (+<1%) ⬆️
src/platform/common/utils.ts 80% <75%> (-1%) ⬇️
...viewer/kernelDataViewerDependencyImplementation.ts 83% <83%> (ø)
...erpreterDataViewerDependencyImplementation.node.ts 84% <84%> (ø)
...eter/nbconvertInterpreterDependencyChecker.node.ts 95% <100%> (ø)
...rc/webviews/extension-side/dataviewer/constants.ts 100% <100%> (ø)
...ide/dataviewer/dataViewerDependencyService.node.ts 100% <100%> (+13%) ⬆️
... and 21 more

@sadasant sadasant marked this pull request as ready for review July 16, 2022 01:09
@sadasant sadasant requested a review from a team as a code owner July 16, 2022 01:09
constructor(private readonly applicationShell: IApplicationShell, private isCodeSpace: boolean) {}

protected async execute(command: string, kernel: IKernel): Promise<(string | undefined)[]> {
const outputs = await executeSilently(kernel.session!, command);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption that the kernel has a session (with the !) seems like it might be sneaking around the type system a bit. I don't see, at least at this point, an explicit reason the session would be undefined. Per the usage of this function it look like it should maybe just take a Session as a non optional parameter? I don't believe anything else on the Kernel is used, and the caller should guarantee that there is a session.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see the check at the start of the dependency check later in the code. Seems like after that you could just use the session.

Copy link
Contributor Author

@sadasant sadasant Jul 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called only by a function that checks for the session, but I like your idea! Passing the session directly, rather than the kernel. Thank you 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this? c3341d7

const pandasVersion = await this.getVersion(kernel);

if (pandasVersion) {
if (pandasVersion.compare(pandasMinimumVersionSupported) > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to have another telemetry event here? It's the type of thing that we can probably calculate using other events, but an explict Pandas === Ok event might help calculating how many Pandas failures we see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this? c0f10c7

export const kernelGetPandasVersion =
'import pandas as _VSCODE_pandas;print(_VSCODE_pandas.__version__);del _VSCODE_pandas';

function kernelPackaging(kernel: IKernel): '%conda' | '%pip' {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DonJayamanne might know best here. But this will use %pip for envs like poetry. In those cases should we just not do the install and ask them to install manually?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes agreed,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does that mean? how do I catch environments like poetry?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's unlikely this will ever pick 'conda' either. When we have an IKernel, we're in a remote situation. Otherwise we'd be using the interpreter.

I think we can leave this as is for now, and then figure out some other way to detect the presence of the package manager later.

sendTelemetryEvent(Telemetry.PandasTooOld);
// Warn user that we cannot start because pandas is too old.
const versionStr = `${pandasVersion.major}.${pandasVersion.minor}.${pandasVersion.build}`;
throw new Error(DataScience.pandasTooOldForViewingFormat().format(versionStr));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the prompt be displayed at this point in time?
I.e. if the version is too old, shouldn't we display the prmonpt and ask to install.

sendTelemetryEvent(Telemetry.UserInstalledPandas);
}
} else {
sendTelemetryEvent(Telemetry.UserDidNotInstallPandas);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are still duplicated, and will be duplicated in 3 places again.

Here's my suggestion

protected async installMissingDependencies(token) {
    const pandasVersions = await this.getPandasVersion(token);
    if (pandasVersion && ) {
        if (pandasVersion.compare(pandasMinimumVersionSupportedByVariableViewer) > 0) {
            return;
        }
        sendTelemetryEvent(Telemetry.PandasTooOld);
    } else {
    sendTelemetryEvent(Telemetry.PandasNotInstalled);
    await this.installMissingDependencies(tokenSource);
}

protected abstract getPandasVersion(token);
protected installMissingDependencies(token) {
        sendTelemetryEvent(Telemetry.PythonModuleInstall, undefined, {
            action: 'displayed',
            moduleName: ProductNames.get(Product.pandas)!,
            pythonEnvType: interpreter?.envType
        });
    const promptResult = ....
    sendTelemetry(...)
    if (doNotInstall) {
        sendTelemetryEvent(....)
    } else {
        sendTelemetryEvent(....)
    }
} 

this way most of the common code is in the base class, and the child classes will inherit the base class and only implement the code to provide the versions and fill inn the blanks for installing.
& all of the code for sending telemetry if prompt is cancelled or clicked ok, or the like is in one place.
Right now its still copied around and still duplicated & I believe it can still be improved to avoid the duplication.

Copy link
Contributor Author

@sadasant sadasant Jul 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These kind of abstractions seem like early optimizations to me, given that they give the impression that the installer can become fully generic, when it isn't ready for doing so. For example:

  • Why tie this to Pandas specifically?
  • How do we pass the kernel or interpreter around? Would we use an abstract type on the abstract class to determine the type of "evaluator" that will be passed along?
  • If not passed around, should the evaluator (kernel / interpreter) be part of the state of the class? If so, should we have a form of setup function that ensures that we always have that evaluator? because we can't use the constructor for this at the moment.
    • Alternatively, should we move the instantiaton of the *DataViewerDependencyImplementation to the checkAndInstallMissingDependencies rather than keeping it in the constructor of the DataViewerDependencyService? That would come at the compromise of having to instantiate a class on every checkAndInstallMissingDependencies call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Don is just suggesting that shared code should be in a base class. Simply to eliminate duplicate code. The design should be relatively close to the same I'd think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides that, the behavior is not exactly the same with the kernel as with the interpreter, because the interpreter version does have some of the installing abstracted out into the IInstaller.

I'd like to reach to a level of compromise for this PR, then get the Plot Viewer working on the web, then come back to the debugger installer, then move all to the IInstaller as the last step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what do you think of this?

export abstract class BaseDataViewerDependencyImplementation<TExecuter> implements IDataViewerDependencyService {
    protected abstract _getVersion(executer: TExecuter): Promise<string | undefined>;
    protected async getVersion(executer: TExecuter): Promise<SemVer | undefined> {
        try {
            const version = await this._getVersion(executer);
            return typeof version === 'string' ? parseSemVer(version) : version;
        } catch (e) {
            traceWarning(DataScience.failedToGetVersionOfPandas(), e.message);
            return;
        }
    }

Then

export class KernelDataViewerDependencyImplementation extends BaseDataViewerDependencyImplementation<IKernel> {
    protected async _getVersion(kernel: IKernelWithSession): Promise<string | undefined> {
        const outputs = await this.execute(kernelGetPandasVersion, kernel);
        return outputs.map((text) => (text ? text.toString() : undefined)).find((item) => item);
    }

I'm thinking I'd rather focus on telemetry wrappers than on splitting the steps into common functions.

Feedback appreciated!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above is just an idea. Once I understand what kind of approach/pattern to take, I can extrapolate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps more like this?

    protected abstract _getVersion(executer: TExecuter): Promise<string | undefined>;

    @traceDecoratorWarn(DataScience.failedToGetVersionOfPandas())
    protected async getVersion(executer: TExecuter): Promise<SemVer | undefined> {
        const version = await this._getVersion(executer);
        return typeof version === 'string' ? parseSemVer(version) : version;
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this look? sadasant#1

cc: @DonJayamanne , @IanMatthewHuff , @rchiodo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged that PR!

throw new Error(DataScience.failedToInstallPandas());
}
} else {
sendTelemetryEvent(Telemetry.UserDidNotInstallPandas);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sending the telemetry after installing and when failing, or when not required to install is still duplicated and feel this can be moved into the base class.

}
sendTelemetryEvent(Telemetry.PandasTooOld);
// Warn user that we cannot start because pandas is too old.
const versionStr = `${pandasVersion.major}.${pandasVersion.minor}.${pandasVersion.build}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code can also be moved into the base class, all the implementations need to do is get the version and return it, and the base class can check if a version was available or not.

Copy link
Contributor

@DonJayamanne DonJayamanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think a lof of the code can be moved into a base class, even if debugger implementation comes in later.

import { BaseDataViewerDependencyImplementation } from './baseDataViewerDependencyImplementation';

export const kernelGetPandasVersion =
'import pandas as _VSCODE_pandas;print(_VSCODE_pandas.__version__);del _VSCODE_pandas';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems weird to have this one one line? Why not have line feeds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the debugger I'll definitely need to work in line feeds, at least the last time I tried it. So, we'll get there 👍

@sadasant
Copy link
Contributor Author

Reported a (new?) flaky test: #10860

@sadasant
Copy link
Contributor Author

Now I'm getting two different failures, one regarding DataScience Exports, and another one where the Log test results job Failed to print test summary. I'm assuming this PR is good to go. I'll merge it.

@sadasant sadasant merged commit 22d43ce into microsoft:main Jul 19, 2022
@sadasant sadasant deleted the web/9665-8 branch July 19, 2022 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataFrame working in web extension
5 participants