Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ecosystem fragmentation and undocumented breaking changes #29751

Closed
brenhinkeller opened this issue Oct 21, 2018 · 38 comments
Closed

Ecosystem fragmentation and undocumented breaking changes #29751

brenhinkeller opened this issue Oct 21, 2018 · 38 comments

Comments

@brenhinkeller
Copy link
Contributor

brenhinkeller commented Oct 21, 2018

This is technically two issues, but the effects to ordinary users are cumulative, so I'm putting it in one place.

I'm a simple domain scientist -- you know, a former matlab user. So, while I'm not the most sophisticated Julia user out there, I'm somewhat representative of a large demographic that Julia is going to want to attract in the future (and indeed, one of the only demographics that isn't ideologically opposed to Julia's one-indexing). Overall, I'm very appreciative of the hard work the core devs have put in for 1.0, and a lot of improvements like the new pkg system are great. However, there are two related problems that have been causing me (and presumably other users like me) a lot of pain lately:

(1) Ecosystem fragmentation

Ok, so now mean and std and var are in their own package, got it. But not the same package as percentile, that's over in StatsBase. And if I want to use erf that's yet another library now, SpecialFunctions.jl! Oh, and normpdf is over in StatsFuns, so that's another one to remember. What's next, sin and cos will be moved to ExtraSpecialTrigFunctions.jl? Not having to load a whole litany of special packages to do anything useful was a feature, and that feature is being actively removed. I thought Julia was supposed to be "as easy for statistics as R"? Having to import a special library to for mean and a different one for erf is definitely not living up to that, that's more like "as easy for statistics as C".

The only pro I see for such changes cited here is from @ViralBShah that "having something in Base has also deterred others from trying alternate ideas", but (1) functions like mean and std and erf don't call for much innovation, and (2) what's stopping anyone from making a contribution to Base?

To a simple user, this feels like going backwards. #27834 doesn't entirely fix the problem, because these things used to just work and now they don't.

(2) Undocumented (and/or unnecessary) breaking changes

Some of these were probably inevitable with the switch to 1.0, but I've probably spent 20 hours in the last couple weeks tracking down why my code doesn't work any more, even when it doesn't raise a deprecation in 0.7. It's then somewhat infuriating to find that in some cases these changes have been made for ideological reasons with no clear practical benefit.

Probably the most egregious case I've found so far is #22828, which as others have noted turns out to break every single example of ccall in https://docs.julialang.org/en/v1.0/manual/calling-c-and-fortran-code/ with no warning or deprecation whatsoever -- and consequently any user code that was ever based on these examples. Adding insult to injury, this was pushed over clear objection from the community with the sole benefit of improving code coverage by removing features. Don't let the perfect be the enemy of the good!

Then there are the cases where a function has been removed from base with a deprecation warning pointing to an external package, but the external package forgot or decided not to include that function after all. The case I've run into recently is linreg, which was removed from base with a deprecation pointing to StatsBase, but apparently real programmers just use \ for such simple things, so why not let it disappear, who cares what it breaks!

Meanwhile, other things like linspace and eye and repmat do at least raise a deprecation warning in 0.7, but were all very frequently used and have been renamed jut because "eew, matlab!" And for what benefit? Yes, the newrepeat syntax is objectively better than the old repmat syntax, but you all know exactly which functions I'm talking about, why not let them remain as aliases or at least with a permanent deprecation warning? Right now, the repl gives you zero feedback for why these common idioms don't work in 1.0.


So now, a brand new user trying any number of ubiquitous idioms from mean to eye the very first time they open the Julia REPL, not only does it not work, there's no information provided about how to make it work.


tl;dr: (1) You have users now! (2) Breaking things has consequences for other people's time! (3) Not following common idioms (or at least providing a permanent deprecation) is hard for new users from other languages! Help me @JeffBezanson @ViralBShah @StefanKarpinski @ararslan -- I know at least some of you care about this sort of thing!

@JeffBezanson
Copy link
Member

I've probably spent 20 hours in the last couple weeks tracking down why my code doesn't work any more, even when it doesn't raise a deprecation in 0.7

I'm sorry you've had difficulty, but please don't suffer in silence --- when you encounter problems like this, post to discourse, slack, or the issue tracker with examples of the problem you're having. We can't fix or help with problems we don't know about. Please trust that we did not furtively insert bugs in the system just to give you a hard time. If you have code that you can't share publicly, you can send it to me or another committer privately for debugging help.

Then there are the cases where a function has been removed from base with a deprecation warning pointing to an external package, but the external package forgot or decided not to include that function after all.

This is not intentional. The way we hope to handle oversights like this is for people to call them to our attention so we can fix them. It's not productive to tell us "you screwed up so many things!" Rather, please tell us which things. I acknowledge the linreg example; we should add a deprecation or shim for that. Are there other examples? To be perfectly clear, the bug is that there is no deprecation for linreg. The bug is not "there is no deprecation for linreg and you don't care about it". Please do not tell me what I care about; that is not productive.

I understand it is slightly inconvenient to write using Statistics, but I don't think this is a major usability issue. All modern languages factor library functions into modules and packages of some kind, which is by now considered a standard software engineering best practice. The term "ecosystem fragmentation" does not apply here --- in my view, that term refers to a lack of working together towards a common goal, not simply to factoring functions into packages.

one of the only demographics that isn't ideologically opposed to Julia's one-indexing

We'll take our chances.

@brenhinkeller
Copy link
Contributor Author

brenhinkeller commented Oct 21, 2018

We'll take our chances.

Don't get me wrong, I'm rooting for you here!

Please trust that we did not furtively insert bugs in the system just to give you a hard time.

I do!

@brenhinkeller
Copy link
Contributor Author

I acknowledge the linreg example; we should add a deprecation or shim for that. Are there other examples?

the ccall / libso issue I mentioned above would be #1 in terms of not having any deprecation. linreg does at least have a deprection, it's just an inaccurate one

I understand it is slightly inconvenient to write using Statistics, but I don't think this is a major usability issue.

You're right, using Statistics isn't bad by itself if that were the only package I had to include for R-like basic statistical functionality, but that's not the case as I mentioned in the initial post

For more context, I'm probably already commited to Julia for my own code whichever way you folks go in the future on these sorts of issues, because the performance benefits are too substantial to ignore. What I'm trying to decide now is more whether I should teach my classes in Julia too. When I was learning scientific/technical computation for the first time, it was common for students to have no prior exposure to coding. That's not really the case any more, and my students are probably going to be familiar with things like mean, std, erf and eye before I ever get them.

@JeffBezanson
Copy link
Member

I agree the organization of the stats packages doesn't seem ideal. One possibility is to merge StatsBase and StatsFuns into Statistics.

@brenhinkeller
Copy link
Contributor Author

I agree the organization of the stats packages doesn't seem ideal. One possibility is to merge StatsBase and StatsFuns into Statistics.

Sounds good to me, for whatever that's worth

@ararslan
Copy link
Member

StatsFuns should be merged into Distributions (see JuliaStats/StatsFuns.jl#20), and the functionality in StatsBase should be redistributed across Statistics, Random, StatsModels, Distributions, and Distances. Currently StatsBase is kind of a grab bag of abstractions that belong in StatsModels and random functionality that doesn't have another home. I expect that once we can develop and version stdlibs separately from Julia itself, there will be more motivation to consolidate.

@brenhinkeller
Copy link
Contributor Author

brenhinkeller commented Oct 21, 2018

I can see the logic for that as well, though it's a bit less satisfying in terms of having R or Matlab-like scientific/technical computing functionality with only one or two usings. I suppose a metapackage like https://github.com/JuliaStats/Stats.jl could help at that point, though I imagine there are issues with that approach too when some are in stdlib and some aren't?

@elextr
Copy link

elextr commented Oct 21, 2018

To voice a different viewpoint from another outsider, the "everything in base" model turns off those who want "Julia the language", not "Julia the R replacement". Julia targets so many possible communities that "special favour" of being in Base for any community will become counterproductive in the long run.

Instead why not formalise meta packages such as "Statistics _like_R" (silly fictional name) that pulls in all the packages that make R like stats easy, and maybe even provides a few convenient shim functions, or "Just_like_Matlab" etc, then you just have to using Statistics_like_R and you're done for all major stats functionality. Each community of interest can contribute to such meta packages.

@brenhinkeller
Copy link
Contributor Author

Fair enough -- I guess I've got the "A fresh approach to numerical computing" / "A fresh approach to technical computing" motto in mind, but Julia may be evolving past that.

using numerics kinda has a nice ring to it on the subject of metapackage names

@nalimilan
Copy link
Member

I suppose a metapackage like https://github.com/JuliaStats/Stats.jl could help at that point, though I imagine there are issues with that approach too when some are in stdlib and some aren't?

No, the only issue isn't technical, it's finding the best name (JuliaStats/StatsKit.jl#5).

@malmaud
Copy link
Contributor

malmaud commented Oct 21, 2018

Also nothing stops someone from writing a meta-package that exports methods from a bunch of difference packages. IPython's %pylab magic is basically like that.

Everyone is going to have different irreconcilable opinions of which methods 'deserve' to be in Base vs in a different namespace, and this way you can find/write whatever metapackage/startup.jl you want so the functions you care about are available to you unnamespaced. Meanwhile it's impossible for users to decide for themselves to move a symbol out of Base, so it makes sense to err on the side of moving things out.

@KristofferC
Copy link
Member

This comes off as kinda ranty and it is hard to find actionable items here to "fix" this issue. So I'm going to close this and encourage opening more specific, actionable issues. The "higher level" discussion can of course continue but I don't see a need to keep the issue itself open.

@brenhinkeller
Copy link
Contributor Author

One specific, actionable issue that I don't see covered by other issues (which I don't know the technical feasibility of, however) would be, both for functions removed from base (std,erf etc) and for those renamed (e.g. linspace) to have the default error message in the REPL explain why these common idioms don't work, e.g.

ERROR: UndefVarError: mean not defined; try using Statistics (or such) rather than the current
ERROR: UndefVarError: mean not defined

this would (1) be helpful for new users and (2) be helpful for people upgrading from older versions given that critical packages like Plots have already dropped support for 0.7 so it's not really possible to develop in 0.7 on a regular basis

@affans
Copy link
Contributor

affans commented Oct 22, 2018

I agree with the Statistics fragmentation. It is quickly getting out of control and some of the packages need to be merged into a single package. I'd rather have one large package maintained by a large community rather than having multiple specialized packages maintained by 2 or 3 folks. When 2.0 hits and packages have to update, it'll be much easier if the big, everyday use packages updated quickly.

Maybe we can add functionality in pkg. Something like typing in using Statistics pulls in all related stats packages StatsFuns, StatsBase, Distributions and so on.

@nalimilan
Copy link
Member

@affans Having small packages doesn't mean they need to be maintained by separate people. Packages under the JuliaStats organization are all maintained by the same team. And the meta-package already exists (as noted above): https://github.com/JuliaStats/Stats.jl.

@JuliaLang JuliaLang deleted a comment from affans Oct 22, 2018
@JuliaLang JuliaLang deleted a comment from affans Oct 22, 2018
@brenhinkeller
Copy link
Contributor Author

brenhinkeller commented Oct 22, 2018

Is there an advantage to many-small-packages even then? Still seems like probably more potential points of failure?

Stats.jl could definitely help though -- is there a way to make it include using Statistics now so we also get mean, std, etc.?

@malmaud
Copy link
Contributor

malmaud commented Oct 22, 2018

At any rate, these types of larger conversion about the Julia ecosystem should be happening on Discourse , not here.

@nalimilan
Copy link
Member

Is there an advantage to many-small-packages even then? Still seems like probably more potential points of failure?

Yes, modularity allows other packages can then depend only on the features they need instead of loading lots of unrelated things. It also makes it easier to replace some parts with others if we find better approaches in the future (instead of being stuck with old interfaces, like R's data.frames). There's no reason why it would make things more fragile.

Stats.jl could definitely help though -- is there a way to make it include using Statistics now so we also get mean, std, etc.?

JuliaStats/StatsKit.jl#11

@KristofferC
Copy link
Member

KristofferC commented Oct 22, 2018

To give a bit of an opposite viewpoints

Yes, modularity allows other packages can then depend only on the features they need instead of loading lots of unrelated things.

Loading packages in Julia should be extremely cheap and I don't think we should structure packages too much based on the performance characteristics of package loading as it is right now. There is no inherent disadvantage of depending on unrelated things, you are not using them anyway. I've never been sad when writing import scipy that I now have the ability to do too much stuff.

It also makes it easier to replace some parts with others if we find better approaches in the future (instead of being stuck with old interfaces, like R's data.frames).

I don't see why that is the case. In fact, when things are split out into different packages I usually find it harder to change something because it is quite often that this needs to be coordinated among all this sub-packages that tend to sneakily use the internals of each other because they were written by the same author.

There's no reason why it would make things more fragile.

Well, typically you lose the integration tests between the functionality that are now in different repos. So instead of having a beautiful puzzle, you have pieces scattered across the room and it can be hard to see if these things still fit together in a good way. Also, documentation is usually more scattered and typically limited to "unit test" style of documentation instead of higher level usage.

As a personal opinion, contributing to packages that are heavily split out is an extreme headache. You spend a lot of time trying to find where each function lives and with all the using going on, this can be a challenge on its own. It is often that a change in one package breaks another and then you need two PRs at the same time but then you get problems with CI and you need to tag both packages at the same time etc.

@brenhinkeller
Copy link
Contributor Author

JuliaStats/StatsKit.jl#11

great!

@brenhinkeller
Copy link
Contributor Author

brenhinkeller commented Oct 22, 2018

At any rate, these types of larger conversion about the Julia ecosystem should be happening on Discourse , not here.

Except there are specific actionable issues here, so I want to object slightly to the "tell the frustrated noob user they're posting in the wrong place" trope

One specific, actionable issue that I don't see covered by other issues (which I don't know the technical feasibility of, however) would be, both for functions removed from base (std,erf etc) and for those renamed (e.g. linspace) to have the default error message in the REPL explain why these common idioms don't work, e.g.

ERROR: UndefVarError: mean not defined; try using Statistics (or such) rather than the current
ERROR: UndefVarError: mean not defined

this would (1) be helpful for new users and (2) be helpful for people upgrading from older versions given that critical packages like Plots have already dropped support for 0.7 so it's not really possible to develop in 0.7 on a regular basis

A second specific, actionable feature request would be to add a permanent deprecation (i.e., it works but with warning to use more Julian syntax), or if that's not palatable to the devs, a customized error message like above for the cases where funcions have recently been renamed for, let's say, "less than technical" reasons, including but probably not limited to:

recently removed function what it's called now why it matters / common to
linspace range common to R, matlab, numpy
contains occursin matlab contains, numpy __contains__
eye Matrix(1.0I,n,n) matlab eye, numpy numpy.eye
repmat repeat matlab repmat, numpy numpy.matlib.repmat

etc!

@ararslan
Copy link
Member

ararslan commented Oct 22, 2018

I think one of the benefits of moving away from Matlab-isms is that it helps people think of Julia as Julia the language rather than Julia the Matlab replacement. There will of course be disconnects when coming from any particular language, which is why we have a "noteworthy differences from other languages" section in the documentation. That seems good enough to me to address the "how to do I migrate from language X" questions.

@StefanKarpinski
Copy link
Member

So the main action item here is "more helpful error messages"? That is both unobjectionable and doable. I seem to recall that there was a PR at some point with a mechanism for pattern matching on errors and customizing their error messages. It seems like that would be the right way to make something like ERROR: UndefVarError: mean not defined; try using Statistics work. We're not going to customize every single thing that someone might expect coming from another language, but we could certainly add a few more helpful messages along those lines.

@brenhinkeller
Copy link
Contributor Author

I think one of the benefits of moving away from Matlab-isms is that it helps people think of Julia as Julia the language rather than Julia the Matlab replacement

Totally fair, which is why I never asked for these to be reverted -- but a deprecation or a custom error message would have no downside from that perspective!

@mbauman
Copy link
Member

mbauman commented Oct 22, 2018

Extensible method errors: #24299 (needs a review), but that doesn't work for undefined names like these. A dictionary lookup could be used there. Also note that we've tried to rally a community that could tackle and maintain a MatlabCompat package that provides these translations, but so far I don't think anyone has taken up the mantle. https://discourse.julialang.org/t/why-eye-has-been-deprecated/12824

@brenhinkeller
Copy link
Contributor Author

brenhinkeller commented Oct 22, 2018

So the main action item here is "more helpful error messages"? That is both unobjectionable and doable.

I would be quite happy with that!

@elextr
Copy link

elextr commented Oct 22, 2018

Perhaps another more discrete actionable from the discussion is the need for messages when things move such that a user program needs to change, even if it only needs a "using xxx", its important to tell users this, and for a reasonable period of time, like all of 1.xx.

Its easy for language developers to forget (have been guilty myself) that not all users of Julia are following the development closely, they are doing their PHDs, high frequency share trading, and other important stuff :)

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Oct 22, 2018

It seems that it may also be easy for programming language users to not realize how annoying and difficult it is to lug around and maintain 16,000 lines of deprecation code (and that's just for 0.6 to 0.7). Not to mention that keeping that code is very much not free in terms of compilation time or system image size. We can certainly add some friendlier error messages but we're not keeping permanent deprecations because something was once spelled differently in some version of Julia.

@brenhinkeller
Copy link
Contributor Author

It seems that it's easy for programming language users to not realize how annoying and difficult it is to lug around and maintain 16,000 lines of deprecation code. We can add certainly add some friendlier error messages but we're not keeping permanent deprecations because something was once spelled differently in some version of Julia.

Guilty as charged. A custom error message would still solve my core complaint of:

a brand new user trying any number of ubiquitous idioms from mean to eye the very first time they open the Julia REPL, not only does it not work, there's no information provided about how to make it work.

so if that's easier to maintain, it's good enough for me!

@elextr
Copy link

elextr commented Oct 22, 2018

It seems that it may also be easy for programming language users to not realize how annoying and difficult it is to lug around and maintain 16,000 lines of deprecation code

@StefanKarpinski of course there is always that tension, but well, the developers are the ones deciding to move/remove stuff not users, and providing warnings of some sort (deprecations or whatever form they take) for such things is just part of professional development. Some days programming is just plain work.

but we're not keeping permanent deprecations

Nobody said permanent, but for one major version.

As I read the FAQ technically you could deprecate today then release a major version tomorrow that removed the deprecated function and still comply with the "rule" as written.

Of course instability pre-1.0 is expected and IIRC explicitly warned about somewhere (google failed me though) but it seems that acceptance of Julia has been somewhat quicker than expected and pre-1.0 versions are used in production.

Having users who expect stability and who have other stuff to do rather than modify their programs so they continue to work vs developers who want to change things for improvements/better organisation/easier maintainability is a normal part of managing a language and libraries.

The OP did a useful thing in reminding the project and other package developers of the non-developer viewpoint, and now 1.0 is out maybe the policy needs revisiting and tweaking and promulgating to optimise the outcome for both parties.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Oct 22, 2018

for a reasonable period of time, like all of 1.xx

That's roughly permanent.

@brenhinkeller
Copy link
Contributor Author

brenhinkeller commented Oct 22, 2018

I guess if I were to further defend why this sort of request might merit developer time, I would claim that something like linspace, which is common to R, matlab, and numpy (probably among others) in addition to being the former usage in Julia, presents a bit more compelling of a case for special behavior than just

because something was once spelled differently in some version of Julia

but for my part in any case, a custom error message would make me happy since "tells me a quick fix to make it just work" isn't that much worse than just works in my book, and is worlds better to a newbie than plain UndefVarError

@JeffBezanson
Copy link
Member

and providing warnings of some sort (deprecations or whatever form they take) for such things is just part of professional development. Some days programming is just plain work.

Several of us spent a huge amount of time on deprecations --- literally thousands of them --- and multiple mechanisms for providing them. They're in v0.7, which is compatible with v1.0. I promise I didn't enjoy it, if that makes you feel any better. So don't act like we've never heard of the concept.

In multiple discourse posts and in the 1.0 release blog post we explained that the way to update code is to first use the transitional 0.7 release, which gives warnings spelling out how to update code.

AFAICT everybody agrees with the idea of adding more helpful messages in 1.x for using certain functions that are now defined in packages. Let's please try to keep this discussion focused and productive.

@elextr
Copy link

elextr commented Oct 23, 2018

Some days programming is just plain work.

I promise I didn't enjoy it, if that makes you feel any better.

I was sympathising with you, not criticising you or your work. I feel like you (as in the project not just @JeffBezanson personally) are being defensive in this issue when you don't need to be.

In multiple discourse posts and in the 1.0 release blog post we explained that the way to update code is to first use the transitional 0.7 release, which gives warnings spelling out how to update code.

That is certainly key information, having a "transition" release that can be used to flag all such upgrades is a viable alternative to the "deprecate it for a whole major version so users have time to notice" option I was suggesting.

But it may be that this approach is not widely enough disseminated to users. Not all users are following all discussions on discourse and a blog post at release is unlikely to be read by a user who updates some time later. I don't have any magic answer for reaching all users I am afraid, just repeat it everywhere, on the download page, as an "Upgrade" section immediately after "getting started" in the documentation, etc etc.

@brenhinkeller
Copy link
Contributor Author

That is certainly key information, having a "transition" release that can be used to flag all such upgrades is a viable alternative to the "deprecate it for a whole major version so users have time to notice" option I was suggesting.

I think this makes it on-topic to reiterate that I personally would have used 0.7 exclusively for several months (giving plenty of time to fix deprecated functions at leisure, using those deprecations you spent so much time on) if it weren't for JuliaPlots/Plots.jl#1760

I promise I didn't enjoy it, if that makes you feel any better.

I kinda suspect there isn't single person in the Julia ecosystem who takes joy in your suffering. Positive and negative feedback together carries much more information than either alone though, and dealing well with the latter is an important test of a mature open-source community.

I'm happy with the apparent outcome here, thanks for your work!

@JeffBezanson
Copy link
Member

I was sympathising with you, not criticising you or your work.

Ok, sorry for misinterpreting your comment. It sounded to me like you were saying we had not given any thought to the upgrade path.

@nalimilan
Copy link
Member

FWIW (since this has been claimed several times), linspace doesn't exist in R.

@brenhinkeller
Copy link
Contributor Author

Sorry, I'm not much of an R user so was just going off a quick google search - I think I was led astray by this https://www.rdocumentation.org/packages/pracma/versions/1.9.9/topics/linspace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants