-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increasingly slow feedback loop for developers, increasingly large WAR files #5593
Comments
Summarizing a very non-systematic experiment I mentioned to @pdurbin:
|
Hi @pdurbin, thank you for opening this issue and your hard work getting these numbers. During my testing and developing for IQSS/dataverse-kubernetes, I noticed the indeed long deploy times, too. Adding my 2 cents to this (just ignore me if you disagree):
|
@poikilotherm That was definitely a crude experiment at stripping things out (the |
@scolapasta will get some questions together for a spike related to this. |
Some actions items out of tech hours that we can act upon:
As a separate issue:
|
I guess this issue has become #5736. Closing. |
This issue still drives me absolutely crazy. Re-opening. This is the command I just ran: time (mvn package && asadmin-payara deploy --force target/dataverse-4.20.war && curl http://localhost:8080/api/info/version) It indicates that it takes a minute and 48 seconds to compile and deploy. Here's the output:
It's a productivity killer. |
I'm on fc37fac and just timed deployment from Netbeans. It's still excruciatingly slow. After changing some back end code (DatasetPage.java) I hit F6 which means "run project" and it took 2 minutes and 43 seconds to compile and deploy. It's a productivity killer. |
|
Like everybody else, I support speeding it up, and I'm happy that it's been scheduled. (Also, since we are primarily talking about developers - as a developer you are likely using direct deployment, from the target/ directory in your project - bypassing the war file stage... there's still some extra copying of these dependent jars, but it's gotta be a fairly negligible overhead) So yes, a practical solution will likely have to be some form of hotswapping. It's great to hear that there may be some good open source tools available (let's look into HotswapAgent, yes) |
Here's another data point. I'm hacking away on something API-related. I change the code. I hit F6 in Netbeans to redeploy. It takes 64 seconds before the app is ready. To me, this is too slow and I feel like we can do better. I'm on 5.3 with a brand new MacBook. |
The Dataverse needs to be rewritten in a language other than compiled Java. Typescript sounds good. The current development environment is unsustainable. |
To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'. If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment. |
As predicted, this issue was automatically closed by the sweep we're making through old issues: https://groups.google.com/g/dataverse-community/c/lDVJq-1CHLY/m/Vl824d8fAQAJ . That's fine. In practice, I'm happier these days, mostly thanks to efforts by @beepsoft that led to write ups in https://guides.dataverse.org/en/6.3/container/dev-usage.html#ide-trigger-code-deploy that allow me to often redeploy code quite quickly. It's not perfect. Sometimes, especially for larger code changes, I have to shut down the whole works and get it running again. But when quick redeploys work they save me a ton of time. Please note that our war file still a big fat pig. 🐷 And it's still slow to deploy. But maybe we can open a fresh issue about that some day! 😅 |
During this week tech hours I expanded on what I said in Slack the other day:
Specifically, I started a spreadsheet called "Development Feedback Loop Time in Seconds" to start attempting to quantify my claim above. As of commit 1d37e99 it took 36 seconds get feedback on the most trivial change I could think of, making a small tweak to the code behind
/api/info/version
add a little more output. Other developers are welcome to add to this spreadsheet to talk about their own experiences (or leave a comment here): https://docs.google.com/spreadsheets/d/12Co_WHgouTPC6tQkL9XyTMJyr_-ZOHPBLW4Or57bqWA/edit?usp=sharingI explained that when I've hacked on PHP apps in the past, feedback is fairly immediate. Feedback from Java will always be slower, I think, but part of my point is that a tiny Java web app like https://github.com/pdurbin/javaee-docker (that does nothing but print "It works!") will give developers a much faster feedback loop than Dataverse. Why is this?
One theory is that the feedback loop is a function of the size of the WAR file you are attempting to deploy. This makes sense to me because, anecdotally speaking, Dataverse was more lean and mean at our 4.0 release than it is now at 4.11. I would be happy to repeat the
/api/info/version
test above (or equivalent) to see how long the feedback loop is. My memory tells me that I used to get feedback faster and that Dataverse is getting slower and slower to deploy over time. @landreev seemed to agree with deployment time increasing with later releases of Dataverse based on his recent experience deploying every release of Dataverse from 4.0 to present in order to put the new "create" scripts in pull request #5317.Last night I created a spreadsheet called "Size of Dataverse WAR file over time": https://docs.google.com/spreadsheets/d/1uL5CVGhMh6Vcr_UUgwbrcHterz7BM1qMPPPrk6ao4ZY/edit?usp=sharing
Here's the data on how much the size of the Dataverse WAR file has increased from 4.0 until 4.11 (from the spreadsheet above):
As you can see, Dataverse 4.0 was 45 MB and Dataverse 4.11 is 187 MB. In between, there were a few jumps that are probably worth mentioning:
Does size matter? @AdamBien says it does. He wrote "WAR sizes are directly related to deployment speed and so productivity" in his post at http://www.adam-bien.com/roller/abien/entry/ears_wars_and_size_matters and on his podcast he promotes the idea of "thin WARs". The argument is that Java EE (Jakarta EE now) has so many APIs that your WAR file should be mostly business logic. As of e707a22
cloc
indicates Dataverse is 137,301 lines of Java, 221,090 lines overall (full report below). How big would the "thin WAR" be with just business logic code and zero dependencies on anything but Java EE? I don't know. My gut is telling me that dependencies (AWS SDK, etc.) make up the bulk of the 187 MB in the Dataverse 4.11 WAR file.An approach I'm less familiar with is "hollow WARs" but I believe the idea is that you put as many dependencies as you can into your application server. Your code still relies on those dependencies but they are no longer in the WAR file itself.
I've focused on size above but are there other approaches? According to the JRebel website, "JRebel fast tracks Java application development by skipping the time-consuming build and redeploy steps in the development process. JRebel helps developers be more productive by viewing code changes in real time and maintaining state." That sounds like the problem I'm describing, but is JRebel a band-aid rather that a fix for the root cause?
Should we split Dataverse into microservices? I mentioned this at tech hours but the general feeling is that microservices will give us a new set of problems.
So, to summarize, the problem are:
Questions:
I'm opening this issue because I was asked to after our conversation during tech hours this week but what I wrote above is only to kick off the conversation. What do you think about all this? What are your experiences? What are your ideas? Please leave comments.
By the way, the "size of WAR" chart above was made by downloading all the "pages" from the GitHub API with commands like curl https://api.github.com/repos/IQSS/dataverse/releases?page=1 and then manually combining the files into a single JSON file I could use with this (ugly) one liner
cat all.json | jq '.[].assets[] | {name, size}' | grep war -A1 | sed ':a;N;$!ba;s/,\n/ /g' | grep -v '\-\-' | perl -lane 'print "@F[1,3]"' | tr -d '"' | tr ' ' '\t' | tac | sed 's/dataverse-//g' | sed 's/\.war//g' > data.tsv
The text was updated successfully, but these errors were encountered: