Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ls performance changes #4891

Open
Stebalien opened this issue Mar 29, 2018 · 4 comments
Open

Ls performance changes #4891

Stebalien opened this issue Mar 29, 2018 · 4 comments
Labels
kind/enhancement A net-new feature or improvement to an existing feature topic/commands Topic commands topic/perf Performance

Comments

@Stebalien
Copy link
Member

Stebalien commented Mar 29, 2018

I'd like to make the following changes to the ls command for performance reasons:

  1. Add a -q flag that only returns names. Base58 encoding a bunch of CIDs is actually quite a time-consuming operation. It would be nice to have a variant that just returns names.
  2. Replace --resolve-types with --type=[always|local|never]. Currently, there's no way to say "don't resolve this type, even if we have the object". Having that option would allow us to avoid hitting the datastore for every file listed. (note, --resolve-types will only be deprecated, not removed).
@Stebalien Stebalien added kind/enhancement A net-new feature or improvement to an existing feature topic/commands Topic commands labels Mar 29, 2018
@kevina
Copy link
Contributor

kevina commented Apr 2, 2018

Sounds Good To me.

@mib-kd743naq
Copy link
Contributor

Base58 encoding a bunch of CIDs is actually quite a time-consuming operation.

Whoa... is this really the case? Using GMP bindings from a rather slow language on a crappy laptop I get over 260,000/s conversions of a random 70 byte string ( chosen to cover potential super-long 512 bit CIDs ). I suspect results from within golang will be even better. Or perhaps requiring a C library itself is an issue...?

~$ perl -Mwarnings -Mstrict -MBenchmark=:hireswallclock -MMath::GMPz -e '

  open my $fh, "<", "/dev/urandom";

  my $gmpz = Math::GMPz::Rmpz_init2_nobless( 8 * 70 );

  sub encode_base58 {
    my $cid = shift;

    Math::GMPz::Rmpz_import(
      $gmpz,
      length($cid),
      1, 1, 0, 0,   # magic GMPz constants
      $cid,
    );
                                                                                
    my $rv = Math::GMPz::Rmpz_get_str( $gmpz, 58 );
                                                                          
    # GMPz uses a linear alphabet:                      0..9,"A".."Z","a".."v"
    # Bitcoin is more "human friendly": grep /[^0IOl]/, 0..9,"A".."Z","a".."z"
    $rv =~ tr
      {0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuv}
      {123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz}
    ;

    $rv;
  }

  timethis( -3, sub {
    sysread( $fh, my $bytes_representing_huge_cid, 70 );

    encode_base58( $bytes_representing_huge_cid );
  });
'
timethis for 3: 3.0228 wallclock secs ( 2.36 usr +  0.66 sys =  3.02 CPU) @ 265435.43/s (n=801615)

@Stebalien
Copy link
Member Author

Whoa... is this really the case?

When listing a directory with 100,000 files, 100 files per ms is actually kind of slow.

@mib-kd743naq
Copy link
Contributor

When listing a directory with 100,000 files...

Given my figure of 265k/s the conversion of 100,000 CIDs twice the size of a typical one should take 300ms. The actual time is likely a fifth of that, as perl itself is embarrassingly slow, it was just faster to code a single command one can paste into their terminal to validate things.

All I am trying to point out is that you are almost certainly solving the wrong problem: if Base58 conversion is a notable factor in your measurements - the converter is seriously flawed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature topic/commands Topic commands topic/perf Performance
Projects
None yet
Development

No branches or pull requests

4 participants