Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacement for tuxi: *oi* and *iris* #197

Open
BeyondMagic opened this issue Nov 17, 2021 · 23 comments
Open

Replacement for tuxi: *oi* and *iris* #197

BeyondMagic opened this issue Nov 17, 2021 · 23 comments

Comments

@BeyondMagic
Copy link
Contributor

BeyondMagic commented Nov 17, 2021

Perhaps re-writing or forking it to keep it alive as I think this can be used smartly by some people or new projects.


iris is written in C++ and is supposed to be a replacement for tuxi and as well for oi.
oi is written in Rust and is supposed to be a replacement for tuxi.


Move your ideas, issues, PRs to one of those projects instead of tuxi as it is not being being actively developed anymore and is, for its great part, not working for anything now.

@Bugswriter
Copy link
Owner

I think I managed this project very poorly and it can be very useful.
Today I needed a cli tool for just checking the meaning of a word. I found lot of garbage vocab tools which are bloated af.
Then I realized wait I can use tuxi, because usually I just google stuff and define feature is not working. I really just want to find the best version of this program and revert the commits.

@BeyondMagic
Copy link
Contributor Author

I really just want to find the best version of this program and revert the commits.

I'm thinking in rewriting the whole code in a readable manner, I'll start early next month and try to match the best of performance and modular code that can be written in POSIX-complaint shell.

@PureArtistry
Copy link
Collaborator

Really sorry I haven't been around for the past 7-8 months (had a bunch of IRL stuff going on, not been online at all)
Not sure if you want any of my input but if you need any help with the re-write or have any questions, let me know

Before I went away I wrote a fork of tuxi in rust, I have updated it this week to make sure everything is working in it again. If you just want a working tool for now you could use that.

If you check the selectors.rs file and all the files in the selectors directory they contain all the updated IDs you need to feed pup to get the various bits info from the html (save you having to spend time in chrome dev tools looking it up yourself)

Again, sorry about kinda ruining this project, most of this is on me. Just let me know if you need anything, happy to both either help or stay out of your way.

@simdimdim
Copy link

simdimdim commented Dec 1, 2021

I'll try to lend a hand with small stuff, just recently I thought I won't have to ever open youtube again thanks to tuxi ( cuz with mpv and yt-dlp I could just play vids directly from cli), but then just a few days later the youtube links parsing stopped working T.T

P.S. Hopefully you'd pick rust as the main language :D

@BeyondMagic
Copy link
Contributor Author

BeyondMagic commented Dec 4, 2021

All right, so thinking through this over the last few weeks (while I was studying for a few exams) I came to the conclusion that we have basically two options here.

The first, of course the easiest, will be to just create a new project with the same features tuxi offers now with a simpler code that we can easily maintain the selectors and the most essential functions. It would be written in shell script, so here a few things to consider:

Pros:

  • Very modular -- anything can be added to it without much work;
  • Easier to code -- it wouldn't take two weeks to get the essentials tuxi has right now, specially with the selectors list of PureArtistry.

Cons:

  • Many limitations -- mainly regarding some features that are pretty nice for an assistant, as the description affirms, like intelligently reading what you're trying to find and use the best scraper-function to find it. Of course it is programmable, but shell script isn't the language to make this type of thing.
  • Not a good tool for long-term ideas -- example: adding complex features it will be very hard on the future assuming we already have a good base code.

The 2nd, would be to write in a compiled language such as C, C++ or Rust -- for performance and feature-rich like of course --, so here's a few things to consider:

Pros:

  • Many features we can work with -- from scraping from multiple sources (not only Google) to intelligently reading the query to use the best scraper-function.
  • Performance -- It will just be faster assuming we code it in a good way.
  • Long term ideas -- it is feasible to have features that even good proprietary assistants have.

Cons:

  • Not so modular -- It will take actually some good time to make the code base modular in those languages -- in the way adding a new source and new selectors isn't hard at all.
  • Hard to code -- note here: hard compared to shell, of course.

Let's not try to make this a vote-issue. We can definitely work through this with conversation.

Now, the thing about tuxi is that we feed it information and uses it to try to find something about this information, now only consider that if we limit it to just Google for simplicity, we don't have an efficient tool because it could do way more than it does. Of course, this creates a priority problem that we already faced pre-#162 PR, but then if we choose the 2nd option, it wouldn't be too hard to actually fix it: maybe we could make an option that tries to find the best source based on the what the query says, maybe we could let the user choose one type of source he/she wants to use, etc... It can have all of those features without losing performance, that's the point.

Of course I'm not an expert nor knowledgeable enough to write from start the best code in C, C++ or Rust (maybe nobody here is), but that's fine as long as we can keep improving it, that won't be much of an issue.

Waiting for your responses, if nobody responds (sadly it is an option), I'll just choose the second option and the language Rust -- maybe not...

@BeyondMagic
Copy link
Contributor Author

And, @simdimdim, I believe there's a cli tool already for that.

@PureArtistry
Copy link
Collaborator

@BeyondMagic

Hey dude do you want to check this out?

It basically is tuxi but written in rust, I've just updated it and it should be bug free (as far as I'm aware).
It's pretty modular as far as adding more google snippets, it wouldn't be too difficult to get info frorm other sources but some thought would need to go into the UX part of that.

Let me know what you think of it, any improvements to be made, stuff missing, could you work with it, etc

@BeyondMagic
Copy link
Contributor Author

BeyondMagic commented Dec 5, 2021

@PureArtistry yeah, it is pretty nice. Could we go further than that with the modular thing and make it so we don't need a rust file for every module or that's out of question? I actually retreat that question, since maybe it is better to handle each selector differently. Now, for the sources (would be another module) it shouldn't be too hard to organize it, first a name [string], url [string] and options? [array:[opt1:value], [opt2:value]] like language and such...

Since you're already programming it on Rust, we definitely could use it as base. Maybe the name too if you want to, the thing is, the word 'oi' is 'hi' in Portuguese, kinda strange to type it in; what you think about "iris"? So we can create an organization already (Edit: though there's already someone named iris on Github...)

@PureArtistry
Copy link
Collaborator

How we handle other sources in the code is something to think about when we actually have other things to add (open to ideas for that btw)

As for the name, oi is what I used to alias tuxi to because tuxi is a little awkward to type on a qwerty keyboard, whereas oi are right next to each other. Also oi in english used kinda like hi, as a way to grab someones attention to ask them a question.

I don't mind changing the name (or using tuxi) - I do quite like oi though.
Why iris? (especially with it already being used)

@BeyondMagic
Copy link
Contributor Author

BeyondMagic commented Dec 5, 2021

Iris is such a cool name.., that's pretty much why.

oi is already taken as well, so just adding a prefix or suffix on it should be fine just for the organizations' name.

And to note, I'll start already looking at the source code to see how everything is being handled so I can start recreating/maintaining it and add more sources (primarily for lyrics since Google doesn't scrape most of the websites).

@simdimdim
Copy link

simdimdim commented Dec 6, 2021

The reason I suggested rust is because I've already worked on something similar (using different selectors according to a site/purpose) here(trait) and here and here(usage).

The concept is implement a trait for a new struct/selector (in an add-on) then just stick it in a HashMap<domain, Box>, in the case of a assistant it should be more along the lines of Map<(command,domain), Box> I guess, but the idea is the same.
(it's also possible to have 'default' implementations of the trait via super traits for groups of domains requiring similar selectors.

P.S. As an anime fan I find 'oi' (it's basically a 'hey') quite cute as a name.
P.S.S. @BeyondMagic Every time I come across something related fzf I find I can respect a project even more than previously believed to be the most one can respect a script. xD
P.S.S.S For those who attempt to read my code, apologies for it being such a mess 🙇

lastly, I'm more in favor of the usage of a compiled language, specifically rust with requests and a html parsing lib are Very easy to work with in my opinion

@BeyondMagic
Copy link
Contributor Author

BeyondMagic commented Dec 6, 2021

My only fear w\ writing in Rust and by so using dependencies as libraries is the dependency hell, which a way bigger language like JS with npm has the feasibility in being easy to create this problem; while in Rust for now isn't that easy to do so, how long until it gets there? For a project like a cli assistant, it shouldn't be too hard if we're managing simple things such as the raw of the raw, nothing more than that. But when we have to create pretty outputs for some scrapers-functions, like tables, we have to rely on dependencies. Taking for example the current oi, there's 150+ packages we need to build, that's a lot for a cli assistant, which in Rust I believe it is kinda what will happen eventually.

I want to be careful here, because comparing this to C or C++, we know that there it's pretty complicated for you to actually do that, while in Rust the cost of memory safety (+ certain things) create this other window of bad possibilities. But then there's that thing, a small thing as I consider, which is other people being able to contribute to such project. Rust, for now, is a more popular language. but is that really important when the tool just does what it can do in the best way? I think not.

Is C++ or C still on table? I would say yes, it is efficient and although slower to develop since we would starting from ground zero, still is something that I/(we) can make good in almost every possible way... so, any thoughts?


The reason I suggested rust is because I've already worked on something similar (using different selectors according to a site/purpose) here(trait) and here(usage).

That's actually a pretty cool project, and the code is similar to the ones' of PureArtistry's; so I guess that's how Rust is able to handle this.

@simdimdim
Copy link

simdimdim commented Dec 6, 2021

In my opinion rust is much easier to work with (closer to python's ability to rapid development) than C/++ is, as for dependencies, as long as one doesn't have any git repositories or wildcard *, there's virtually no chance for breakage to occur as far as I'm aware. there's also still a Cargo.lock for extreme cases. 150 packages don't sound like a lot to me (i'm expecting 200-400 for a mature project to be normal), since most of them are just dependencies of dependencies, for the most part I'd say we need just reqwest, select, tokio, url, futures and maybe chrono and quality of life stuff like log and itertools. (maybe something that integrates with OS commands for external app execution)
as far as pretty print goes, I feel this should be an extremely easy problem to solve, if it can even be called that. If the output of the scrappers could be limited by a few general types (such as text/article, table, picture, url (in short just a limited number of types (and in other words, separate the scrapping and output/printing))) handling those would be easy and predictable, no?
there's also a whole bunch of cli libs available with different levels of abstraction, available tools and capabilities.
I'd say it's way (way) more likely to run into libraries versions mismatches in C/++ than rust. + I feel rust project management is also easier than C++. +writing multi-platform code in rust is feels like it should be easier compared to C/++.

That's actually a pretty cool project, and the code is similar to the ones' of PureArtistry's; so I guess that's how Rust is able to handle this.

Thanks, have been working on it on and off offline for some time, had more time lately so will probably finish the file handling (on disc and freshly downloaded) soon. And yes, rust ain't quite like C/++ but I'm yet to hear to be unable to handle stuff C/++ can. It just does so via different tools/concepts.

P.S. I feel like I sound like quite the rust fanboy, but it's not due to my belief it's a great language, rather it's because I'm yet to run into any nastiness while working with it. But in the end language is also just a tool and it is also my belief one should choose the optimal tool for the job, (so in the end it's a matter of determining what the 'job' is I guess :D)

@BeyondMagic
Copy link
Contributor Author

The job for now is the essential, it is to scrape a page, then run a loop through its content with the selectors to print out what it found, for something so simple, of course the best choice is shell script.

But eventually we'll go further than that, making a prettier output may rely on other things, multiple sources is an idea, long-term ideas like such are better fitted to compiled languages; and using oi for the follow up example: what Rust can do that C/++ can't besides being 55mb in binary size and with far more libraries/dependencies? Easier to code, modularity of dependencies?

I think I'm relying on the idea that Rust isn't a replacement of C/++ here, we should use it if its benefits outgrow the disadvantages. Nonetheless relative is relative, for one, easier to code/faster to code is a feature that can easily be an disadvantage in this situation when we talk about performance. And dependencies? Talk about npm and 150+ packages is nothing, talk about C/++ and that's far more libraries than even big C/++ projects have.

@simdimdim
Copy link

As far as I'm aware c/++ and rust are comparable in performance (that's what most articles on the Internet say anyway). Also as far as I'm aware there's little essential difference between what c/++ can do and rust can't or vice versa. Not sure what that 55mb binary is about. With rust I'd say if the code compiles you only need to look for logical errors, meanwhile, c/++ is just able to compile quicker, but you still have to keep a lot of things in mind. Also multi-threading in rust is way easier.
I'd like to also point out that 150+ packages, does not mean 150+ dependencies, there's usually not a very large number of dependencies (10 maybe 20 is what I've seen in projects usually) it's just that the dependencies have their own dependencies, but we, as the libraries users don't need to worry about them, if a library is on crates.io it's assumed to at least compile without errors, which in rust translates to 'unless it has some dubious logical errors it's fine'. The other day I happened upon how easy it is to configure if a lib should compile with additional feature or not, so I'd say rust definitely can handle modularity. In conclusion I'd say rust would offer a lot more stable development than C/++, even it's a still new technology and doesn't have a monstrous amount of completely high-end libraries like C/++.

@BeyondMagic
Copy link
Contributor Author

Multi-threading for a cli assistant is too much for what it is supposed to do, it's not happening; it is a less-than-a-second run, let's keep it simple there because it should be simple here.

I'd like to also point out that 150+ packages, does not mean 150+ dependencies, there's usually not a very large number of dependencies (10 maybe 20 is what I've seen in projects usually) it's just that the dependencies have their own dependencies,

I mean, one of the points of dependency hell is exactly that, it is having dependencies of dependencies or dependencies of dependencies of dependencies and so forth, all of which you still have to compile in Rust, so it definitely counts.

Not sure what that 55mb binary is about

and using oi for the follow up example

Static libraries compiled inside the binary, I guess?

I will add a few things later on this.

@PureArtistry
Copy link
Collaborator

@BeyondMagic

I think you have some misconceptions of how cargo/rust works. There are a lot of dependencies because the rust std library is quite small, lots of features built into the C/++ std lib are instead provided as separate crates, a lot of projects like to break up the various functions of a program into their own crates for clean code re-use.

Dependency hell also isn't worse in rust compared to anything else, every version of every package/crate is stored on crates.io. You can specifiy exactly which version you want to use in the Cargo.toml file and after compilation a Cargo.lock file is generated with the details (with version and checksum) of every crate used in the compilation and when building a release version it will fetch the exact crates listed in the lock file

Rust binaries are typically larger than the C/++ equivalent but you're looking at the wrong file for the size.
The debug build of oi is 55.2mb, the release build that you actually use is 4.43mb

@BeyondMagic
Copy link
Contributor Author

Rust binaries are typically larger than the C/++ equivalent but you're looking at the wrong file for the size.
The debug build of oi is 55.2mb, the release build that you actually use is 4.43mb

Well, that's true and that was my mistake, thanks for pointing it out.

However the dependency hell I'm looking at here isn't about versions or dependencies that use certain versions, that Cargo and even npm handles well and I'm aware of it, I'm talking about the number of dependencies -- it is a lot, being specifically here, in C++ that is never happening if you take little care of your project; what is in those projects (dependencies) that asks for so many? Those dependencies really need it all? Sometimes I think if I dig deeper I can find the equivalent of this on Cargo.

Since you're already here, @PureArtistry, what feature on oi is giving this huge dependency tree? The tables feature?

Like, to download a source-page on C++ I need one library, curl (which is available everywhere) or if we're going an easier path we can use cpr (two libraries) -- very simple there. What then would be the equivalent in Rust since you both are more experienced than obviously me on it?

@PureArtistry
Copy link
Collaborator

Most of the crates in oi are deps of scraper (the html parsing lib), if you run cargo tree it will give you a full breakdown.

I think the number of packages is sort of a cultural thing, rust as a language was created around having cargo as a package manager and repo for libs.

When it's so easy to find, share & use libraries people tend to write more modular code, also the compiler only uses the code from those libs that's needed so it doesn't affect the final binary only compilation time and disk space while working with / building the binary.

for a curl equivalent in rust you can use:
curl
ureq (what I use in oi)
reqwest (this is overkill for something like this, it's a much more featureful lib)

@simdimdim
Copy link

I also think trying to minimize the number of dependencies is the wrong thing to focus on, in rust that's somewhat similar to saying you want to minimize the usage of reusable code. I'm sure rust also has silly libraries like that, but choosing to use them is entirely different thing (we're not implored to use something like 'true' in rust :D).
for http requests lib there's also hyper (which is also the most downloaded one, all-time and recently too), well there are (many) others too.

Usually the amount of dependencies a project has is more a measure of how many things the project does (or needs,) or how good it is at handling code reuse.

random example: If I recall correctly the way c++ handles random number generation is part of the std, in rust, it's its own crate. It's not that std is crippled, it just doesn't need to be part of std

@BeyondMagic
Copy link
Contributor Author

It's not that std is crippled, it just doesn't need to be part of std

What kind of a powerful general-purpose language, and systematic one to note, doesn't come with the equivalent of rand? Perhaps that's more of a problem of the modularity driven-mind that Rust is always into.

Usually the amount of dependencies a project has is more a measure of how many things the project does (or needs,) or how good it is at handling code reuse.

Code reuse is necessary when it is necessary, like when we need a whole HTTP request method that just does what it needs to do -- nobody wants to rewrite such huge thing for their project only. But not everything needs to be reused though, definitely not everything needs to be its own dependency, we all have the capacity to create certain codes specifically for our own project, for efficiency or even for own modularity of code.

Look, if none of you are going to give a chance for C++ or C here, we ultimately have to go with Rust, but right now I can't see a single benefit of using it over C++ with this project.

@BeyondMagic
Copy link
Contributor Author

you_know_what, after trying more with Rust for the past few days, I can see some reasons for using it over C++ overall, but still I don't think in the long run it will be better for this project, besides the uglier syntax and bigger binary sizes 😄 (I'm joking, kinda).

However the point of starting this thread is that I wanted to achieve the best out of performance and modularity of our code, not feature-rich from dependencies for tables or for a prettier output of what we scrape, and if there is a language in which this project can be rewritten into and that I can I see a future ahead is in C++.

From @PureArtistry at the start,

Just let me know if you need anything, happy to both either help or stay out of your way.

If I can kindly use this, I would be happy if you actively maintain oi as you want it to be maintained, the thought of having a Rust competitor can give me enough motivation to maintain a better C++ project until it can actually replaces tuxi entirely and even, if we get there, oi as well.

I will take my time to actually create the first working version of the project in C++, since it is -- you know -- C++, but we all know that doubt kills more dreams than failure ever will, so that's that.

@BeyondMagic BeyondMagic changed the title Dead? Replacement for tuxi: *oi* and *iris* Dec 14, 2021
@PureArtistry
Copy link
Collaborator

@BeyondMagic, sounds good mate! - the offer for help is always open but my C++ is meh (my brain doesn't do OOP)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants