Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency update: Apache Common Lang v2 -> v3 #6070

Closed
poikilotherm opened this issue Aug 2, 2019 · 6 comments · Fixed by #7801
Closed

Dependency update: Apache Common Lang v2 -> v3 #6070

poikilotherm opened this issue Aug 2, 2019 · 6 comments · Fixed by #7801

Comments

@poikilotherm
Copy link
Contributor

poikilotherm commented Aug 2, 2019

This is related to my other dependency housekeeping things. See #5288 and #5360.


While programming unit tests for #6000, I wanted to use org.apache.commons.lang3.RandomStringUtils. In Maven POM I saw that Dataverse is still on org.apache.commons.lang v2.6, which is dead since 2011, when 3.0 was released.
See also: http://www.h-online.com/open/news/item/Apache-Commons-Lang-3-0-makes-a-break-with-the-past-1283761.html

Dataverse uses a lot of Commons Lang stuff. It should be refactored to use v3 of the lib, which gets included anyway, as it is a dep of com.lyncode:xoai-common. (see mvn dependency:tree, grepping for commons-lang). Nothing else is relying on the v2 lib.

Notes about upgrading from upstream project: https://commons.apache.org/proper/commons-lang/article3_0.html

@poikilotherm poikilotherm changed the title Dependency update: Apache Common Lang Dependency update: Apache Common Lang v2 -> v3 Aug 2, 2019
@djbrooke
Copy link
Contributor

djbrooke commented Aug 5, 2019

Hi @poikilotherm, do you plan to work on this one or should I keep it unassigned?

@poikilotherm
Copy link
Contributor Author

poikilotherm commented Aug 5, 2019

I can submit a PR, but I thought about discussing this first? Maybe there is some technical issue with this or you guys want smaller chunks or whatever...

Just wanted to be polite, ask and discuss before I do... 😉

If you @djbrooke want me to work on this and everybody at IQSS is fine with that, this should be a rather small thing to do... I'll take an assignment as a "go!".

@pdurbin
Copy link
Member

pdurbin commented Aug 6, 2019

@poikilotherm I'm wondering if we should wait until we delete the all the locally build jars in "local_lib/com/lyncode" from the codebase before we take on this refactoring. These jars: https://github.com/IQSS/dataverse/tree/v4.15.1/local_lib/com/lyncode

I found these by reading this comment in the pom.xml:

    <!-- EXPERIMENTAL: -->
    <!-- lyncode xoai OAI-PMH implementation: -->
    <!-- unfortunately, their 4.10 version -->
    <!-- is still buggy. As an experiment, I'm using -->
    <!-- a patched version I built locally. -->
    <!-- (pull requests pending - L.A. -->
    <dependency>
        <groupId>com.lyncode</groupId>
        <artifactId>xoai-common</artifactId>
        <version>4.1.0-header-patch</version>
    </dependency>

What I'm trying to say is that maybe we should just stick with the v2 version (unless there's a security issue) until we have time to move completely to v3. As I said in this issue, I'm worried about the war file getting bigger and bigger: Increasingly slow feedback loop for developers, increasingly large WAR files #5593

@pdurbin
Copy link
Member

pdurbin commented Aug 7, 2019

Whoops, I had some misunderstanding but @poikilotherm set me straight at http://irclog.iq.harvard.edu/dataverse/2019-08-06#i_102177

I'd like to back up to what problem we're trying to solve here.

Originally, @poikilotherm said he'd like to use v3 of Apache Common Lang. As he stated above, v3 is already available in Dataverse because is it pulled in as a transitive dependency. He could use v3 right now but it's a bad practice to write code that relies on a transitive dependency. We could declare v3 as an explicit dependency in the pom and keep v2 in there as well but this is not desirable to @poikilotherm . He would rather change the version in the pom from v2 to v3.

The second problem @poikilotherm is trying to solve is that v2 is old. It's flagged as "legacy" by its developers. I assume Dataverse has a lot of old dependencies like this. I don't know if there is any immediate risk in this case. I don't know of any security problems or anything.

The third problem @poikilotherm is trying to solve is to help get the war file to be smaller. It is unknown how much smaller the war file will become.

Those are the benefits. What are the costs or risks? There's @poikilotherm 's valuable time, of course, and he estimates that 87 Java files will be affected. That seems like a lot to me but he said he's willing to provide a list of what to test to QA. I guess we'll see how long that list before we can guess how much of a QA effort this will become.

I'm not personally very enthusiastic about the benefits above (no new features or bug fixes) but if @poikilotherm wants to go ahead and make a pull request and summarize what features need to be tested, I'm happy to review it.

@poikilotherm
Copy link
Contributor Author

poikilotherm commented Sep 3, 2019

Alright, my findings so far in develop@09fe94bd:

Text occurences of Apache Commons Lang V3

find src/main/java src/test/java -name '*.java' -exec grep "commons\.lang3\." {} \; | sort | uniq -c | sort -n
      1 import org.apache.commons.lang3.math.NumberUtils;
      1 import org.apache.commons.lang3.StringUtils;
      1 import org.apache.commons.lang3.text.WordUtils;
      3 import org.apache.commons.lang3.StringUtils;

So there is already bad code in place: usage of a transitive dependency...

Text occurences of Apache Commons V2

$ find src/main/java src/test/java -name '*.java' -exec grep "commons\.lang\." {} \; | sort | uniq -c | sort -n
      1 import org.apache.commons.lang.ArrayUtils;
      1 import org.apache.commons.lang.builder.*;
      1 import org.apache.commons.lang.builder.ToStringBuilder;
      1 import org.apache.commons.lang.builder.ToStringStyle;
      1 import org.apache.commons.lang.mutable.MutableBoolean;
      1 import org.apache.commons.lang.mutable.MutableLong;
      1 import org.apache.commons.lang.NotImplementedException;
      1 import static org.apache.commons.lang.StringUtils.isEmpty;
      2 import static org.apache.commons.lang.StringUtils.isNumeric;
      4 import org.apache.commons.lang.RandomStringUtils;
      8 import org.apache.commons.lang.StringEscapeUtils;
      9 import org.apache.commons.lang.*;
     56 import org.apache.commons.lang.StringUtils;

Classes using Apache Commons Lang V2

Click to expand list of 81 files "using" it in source code...
$ find src/main/java src/test/java -name '*.java' -exec grep -l "commons\.lang\." {} \; | sort -f | cut -f2,8- -d"/"
main/api/AbstractApiBean.java
main/api/datadeposit/SwordServiceBean.java
main/api/datadeposit/UrlManager.java
main/api/DatasetFieldServiceApi.java
main/api/Groups.java
main/api/HarvestingServer.java
main/api/imports/ImportDDIServiceBean.java
main/api/imports/ImportServiceBean.java
main/api/Search.java
main/authorization/AuthTestDataServiceBean.java
main/authorization/DataverseRolePermissionHelper.java
main/authorization/providers/builtin/DataverseUserPage.java
main/authorization/providers/builtin/PasswordEncryption.java
main/dataaccess/TabularSubsetGenerator.java
main/DataCitation.java
main/DataFileServiceBean.java
main/DataFileTag.java
main/DatasetField.java
main/DatasetFieldCompoundValue.java
main/DatasetFieldValidator.java
main/DatasetFieldValue.java
main/DatasetFieldValueValidator.java
main/DatasetPage.java
main/DatasetServiceBean.java
main/datasetutility/AddReplaceFileHelper.java
main/DatasetVersionDifference.java
main/DatasetVersionServiceBean.java
main/DataverseHeaderFragment.java
main/DataversePage.java
main/DOIDataCiteRegisterService.java
main/DvObjectServiceBean.java
main/EditDatafilesPage.java
main/engine/command/impl/ImportDatasetCommand.java
main/Guestbook.java
main/GuestbookPage.java
main/HandlenetServiceBean.java
main/harvest/client/HarvesterServiceBean.java
main/harvest/client/oai/OaiHandler.java
main/harvest/server/web/servlet/OAIServlet.java
main/HarvestingClientsPage.java
main/HarvestingSetsPage.java
main/ingest/IngestableDataChecker.java
main/ingest/tabulardata/impl/plugins/csv/CSVFileReader.java
main/ingest/tabulardata/impl/plugins/dta/DTAFileReader.java
main/ingest/tabulardata/impl/plugins/dta/NewDTAFileReader.java
main/ingest/tabulardata/impl/plugins/por/PORFileReader.java
main/ingest/tabulardata/impl/plugins/rdata/RDATAFileReader.java
main/ingest/tabulardata/impl/plugins/rdata/RTabFileParser.java
main/ingest/tabulardata/impl/plugins/sav/SAVFileReader.java
main/ingest/tabulardata/impl/plugins/xlsx/XLSXFileReader.java
main/ingest/tabulardata/InvalidData.java
main/MailServiceBean.java
main/ManageFilePermissionsPage.java
main/ManageGroupsPage.java
main/ManagePermissionsPage.java
main/mydata/DataRetrieverAPI.java
main/mydata/MyDataFilterParams.java
main/mydata/MyDataFinder.java
main/mydata/RoleTagRetriever.java
main/mydata/SolrQueryFormatter.java
main/NavigationWrapper.java
main/passwordreset/PasswordResetPage.java
main/RoleAssigneeServiceBean.java
main/RolePermissionFragment.java
main/rserve/RemoteDataFrameService.java
main/rserve/RJobRequest.java
main/rserve/VariableNameCheckerForR.java
main/search/AdvancedSearchPage.java
main/search/IndexServiceBean.java
main/search/SearchIncludeFragment.java
main/search/SearchUtil.java
main/ThemeWidgetFragment.java
main/UserServiceBean.java
main/util/FileSortFieldAndOrder.java
main/util/MarkupChecker.java
main/util/SumStatCalculator.java
main/validation/PasswordValidatorUtil.java
test/api/DatasetsIT.java
test/DataCitationTest.java
test/mydata/SolrQueryFormatterTest.java
test/validation/PasswordValidatorTest.java

Static code analysis detects real usage

$ jdeps -v target/classes | grep "commons\.lang\." | tr -s " " | cut -f4 -d" " | sort -f | uniq -c
      3 org.apache.commons.lang.ArrayUtils
      2 org.apache.commons.lang.builder.ToStringBuilder
      2 org.apache.commons.lang.builder.ToStringStyle
      1 org.apache.commons.lang.mutable.MutableBoolean
      1 org.apache.commons.lang.mutable.MutableLong
      1 org.apache.commons.lang.NotImplementedException
      5 org.apache.commons.lang.RandomStringUtils
      7 org.apache.commons.lang.StringEscapeUtils
     66 org.apache.commons.lang.StringUtils

Static code analysis shows which files are really using the library

Click to view 79 classes using org.apache.commons.lang.*
jdeps -v target/classes target/test-classes | grep "commons\.lang\." | tr -s " " | cut -f2 -d" " | sort -u | cut -f5- -d"." 
api.AbstractApiBean
api.datadeposit.SwordServiceBean
api.datadeposit.UrlManager
api.DatasetFieldServiceApi
api.DatasetsIT
api.Groups
api.HarvestingServer
api.imports.ImportDDIServiceBean
api.imports.ImportServiceBean
api.Search
authorization.AuthTestDataServiceBean
authorization.DataverseRolePermissionHelper
authorization.providers.builtin.DataverseUserPage
authorization.providers.builtin.PasswordEncryption
dataaccess.TabularSubsetGenerator
DataCitation
DataCitationTest
DataFileServiceBean
DataFileTag
DatasetField
DatasetFieldCompoundValue
DatasetFieldValidator
DatasetFieldValue
DatasetFieldValueValidator
DatasetPage
DatasetServiceBean
datasetutility.AddReplaceFileHelper
DatasetVersionDifference
DatasetVersionServiceBean
DataverseHeaderFragment
DataversePage
DOIDataCiteRegisterService
DvObjectServiceBean
EditDatafilesPage
engine.command.impl.ImportDatasetCommand
GuestbookPage
HandlenetServiceBean
harvest.client.HarvesterServiceBean
harvest.client.oai.OaiHandler
HarvestingClientsPage
HarvestingSetsPage
harvest.server.web.servlet.OAIServlet
ingest.IngestableDataChecker
ingest.tabulardata.impl.plugins.csv.CSVFileReader
ingest.tabulardata.impl.plugins.dta.DTAFileReader
ingest.tabulardata.impl.plugins.dta.NewDTAFileReader
ingest.tabulardata.impl.plugins.por.PORFileReader
ingest.tabulardata.impl.plugins.rdata.RDATAFileReader
ingest.tabulardata.impl.plugins.rdata.RTabFileParser
ingest.tabulardata.impl.plugins.sav.SAVFileReader
ingest.tabulardata.impl.plugins.xlsx.XLSXFileReader
ingest.tabulardata.impl.plugins.xlsx.XLSXFileReader$SheetHandler
ingest.tabulardata.InvalidData
MailServiceBean
ManageGroupsPage
ManagePermissionsPage
mydata.DataRetrieverAPI
mydata.MyDataFilterParams
mydata.MyDataFinder
mydata.SolrQueryFormatter
mydata.SolrQueryFormatterTest$SolrQueryFormatterNoParamTest
NavigationWrapper
passwordreset.PasswordResetPage
RoleAssigneeServiceBean
RolePermissionFragment
rserve.RemoteDataFrameService
rserve.RJobRequest
rserve.VariableNameCheckerForR
search.AdvancedSearchPage
search.IndexServiceBean
search.SearchIncludeFragment
search.SearchUtil
ThemeWidgetFragment
UserServiceBean
util.FileSortFieldAndOrder
util.MarkupChecker
util.SumStatCalculator
validation.PasswordValidatorTest$Params
validation.PasswordValidatorUtil

Unit test coverage for these classes
Looks like this is very mixed result.

Click to expand test coverage calculation
for I in `cat classes-oacl`
do
grep ",$I," target/site/jacoco/jacoco.csv | awk -F, '{ print $5/($4+$5)*100 " " $2 "." $3 }'
done | LC_ALL=C sort -n | sed -e "s/edu.harvard.iq.dataverse.//"
0 DOIDataCiteRegisterService
0 DatasetPage
0 DatasetVersionDifference
0 DataverseHeaderFragment
0 DataversePage
0 DvObjectServiceBean
0 EditDatafilesPage
0 GuestbookPage
0 HarvestingClientsPage
0 HarvestingSetsPage
0 ManageGroupsPage
0 ManagePermissionsPage
0 NavigationWrapper
0 RolePermissionFragment
0 ThemeWidgetFragment
0 api.DatasetFieldServiceApi
0 api.Groups
0 api.HarvestingServer
0 api.Search
0 api.datadeposit.SwordServiceBean
0 api.datadeposit.UrlManager
0 api.imports.ImportDDIServiceBean
0 api.imports.ImportServiceBean
0 authorization.AuthTestDataServiceBean
0 authorization.DataverseRolePermissionHelper
0 datasetutility.AddReplaceFileHelper
0 engine.command.impl.ImportDatasetCommand
0 harvest.client.HarvesterServiceBean
0 harvest.client.oai.OaiHandler
0 harvest.server.web.servlet.OAIServlet
0 ingest.tabulardata.InvalidData
0 ingest.tabulardata.impl.plugins.por.PORFileReader
0 ingest.tabulardata.impl.plugins.rdata.RDATAFileReader
0 ingest.tabulardata.impl.plugins.rdata.RTabFileParser
0 ingest.tabulardata.impl.plugins.xlsx.XLSXFileReader
0 mydata.DataRetrieverAPI
0 mydata.MyDataFilterParams
0 mydata.MyDataFinder
0 passwordreset.PasswordResetPage
0 rserve.RJobRequest
0 rserve.RemoteDataFrameService
0 rserve.VariableNameCheckerForR
0 search.AdvancedSearchPage
0 search.SearchIncludeFragment
0 util.SumStatCalculator
0.28393 MailServiceBean
0.397456 UserServiceBean
0.505433 search.IndexServiceBean
1.9802 HandlenetServiceBean
2.01845 DatasetServiceBean
3.03605 DatasetVersionServiceBean
3.55086 RoleAssigneeServiceBean
4.7619 authorization.providers.builtin.DataverseUserPage
8.22237 DataFileServiceBean
9.35412 api.AbstractApiBean
10.0428 dataaccess.TabularSubsetGenerator
13.1356 DatasetFieldCompoundValue
43.9465 DatasetField
45.696 ingest.tabulardata.impl.plugins.dta.DTAFileReader
46.2121 DataFileTag
49.2537 DatasetFieldValue
50.6716 ingest.tabulardata.impl.plugins.sav.SAVFileReader
50.8178 ingest.IngestableDataChecker
51.4436 search.SearchUtil
54.9595 DataCitation
65.2174 DatasetFieldValidator
73.1868 DatasetFieldValueValidator
85.3357 ingest.tabulardata.impl.plugins.dta.NewDTAFileReader
89.3822 ingest.tabulardata.impl.plugins.csv.CSVFileReader
91.4286 authorization.providers.builtin.PasswordEncryption
91.7208 validation.PasswordValidatorUtil
97.2477 util.MarkupChecker
97.4619 mydata.SolrQueryFormatter
100 util.FileSortFieldAndOrder

poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Apr 14, 2021
…6070

Also switch from bulk imports of commons.lang3.* to single class style.
Fixing for other parts like java.util, too.
poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Apr 14, 2021
poikilotherm added a commit to poikilotherm/dataverse that referenced this issue Apr 14, 2021
IQSS#6070

Apache Commons Lang3 3.12.0 moved `StringEscapeUtils` to Apache Commons
Text. Also, escapeHtml() and escapeXml() from Apache Commons Lang (v2)
have been renamed to escapeHtml4() and escapeXml10() with v3.0
@poikilotherm
Copy link
Contributor Author

I just created a PR for this, as I want to use stuff from Lang3 (see PR).
Looks like there is a bug hiding in the CSV ingest code. Would appreciate some help with this. Thx.

landreev added a commit to poikilotherm/dataverse that referenced this issue May 17, 2021
poikilotherm added a commit to poikilotherm/dataverse that referenced this issue May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants