Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28028: Remove duplicated proto reader/writer classes introduced in HIVE-19288 #5079

Merged
merged 1 commit into from
Mar 29, 2024

Conversation

Aggarwal-Raghav
Copy link
Contributor

What changes were proposed in this pull request?

Please check JIRA comments for discussion HIVE-28028

Why are the changes needed?

There are duplicate java files present in hive which are already present in apache tez. I checked the diff between these files from hive and tez and found there are few improvement in TEZ (like TEZ-4296, TEZ-4105, TEZ-4305)

Does this PR introduce any user-facing change?

NO

Is the change a dependency upgrade?

NO

How was this patch tested?

On local mac

Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@okumin
Copy link
Contributor

okumin commented Feb 11, 2024

Just for clarification. I recently reviewed #5036, which would rely on these files. I expect this PR will not require #5036 to be modified as the class in Apache Tez has the same fully qualified class name and it is compatible with the one in the Apache Hive repo.

@Aggarwal-Raghav
Copy link
Contributor Author

Just for clarification. I recently reviewed #5036, which would rely on these files. I expect this PR will not require #5036 to be modified as the class in Apache Tez has the same fully qualified class name and it is compatible with the one in the Apache Hive repo.

Thanks for the review @okumin.
Yes, the reason for this PR is to remove redundant code from Hive. After removal of these files, the functionality should work as is. Hence no changes required in #5036

@Aggarwal-Raghav
Copy link
Contributor Author

@abstractdog, can you please help review this?

@@ -116,6 +116,32 @@
</exclusion>
</exclusions>
</dependency>
<dependency>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this dependency seems to be used in itests/hive-unit and ql too, questions:

  1. would it make sense to define this in root pom's dependencyManagement with exclusions and just refer to that here?
  2. the exclusions are different in the two places, is that intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.) I followed the same pattern done for other tez dependencies line tez-dag etc. In parent pom.xml, just dependency in there and in ql/pom.xml, they have done the exclusion. Same goes for itests module and sub modules.

2.) Every tez dependency is trying to exclude some hadoop dependencies, but they are not same in submodules, for example, https://github.com/apache/hive/blob/master/itests/qtest/pom.xml#L392 and https://github.com/apache/hive/blob/master/itests/hive-minikdc/pom.xml#L150, both of then are tez-dag dependecy but have different exclusions. Hence I don't know which hadoop dependecy to exclude, and what's the history about them. That's why I kept the exclusions same as other tez dependencies in same pom.xml

If there is any suggestion around it, I am happy to re-evaluate and re work on the exclusions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abstractdog, apologies for delay. Can you provide your guidance on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aggarwal-Raghav can you pull this to the parent pom as Laszlo suggested & for the exclusions, combine all of them together, do check once if they are actually being pulled in or not.

if this approach creates some issue, we can consider excluding dependencies individually

@Aggarwal-Raghav
Copy link
Contributor Author

@ayushtkn, If I remove all the exclusions from the tez-protobuf-history-plugin, then no log4j or any hadoop dependency is coming. Attaching the dependency tree for reference.
So, should we keep exclusions just to be on the safe side?

hive-unit_dependency_tree.txt
ql_dependency_tree.txt

@Aggarwal-Raghav
Copy link
Contributor Author

Have updated the PR, considered the suggestions from Laszlo, and have excluded hadoop dependencies which are used in https://github.com/apache/tez/blob/rel/release-0.10.3/tez-plugins/tez-protobuf-history-plugin/pom.xml

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you haven't all the excluded dependencies, earlier you had hadoop-mapreduce-client-core and others excluded in ql/pom.xml but now in the parent pom, they aren't excluded, why?

@Aggarwal-Raghav
Copy link
Contributor Author

I think you haven't all the excluded dependencies, earlier you had hadoop-mapreduce-client-core and others excluded in ql/pom.xml but now in the parent pom, they aren't excluded, why?

Will add the rest

@anishek
Copy link
Contributor

anishek commented Mar 14, 2024

The exercise to move all dependencies to hive project level common pom.xml might require more work. There already seems to be a lot of tez related dependencies that are specified in the hive project root pom.xml with no exclusions currently. The exclusions are specifically in the ql module pom.xml, the one place that i saw. moving them up might lead to change in state of other places these dependencies are included and i think should be a larger exercise and we should create another jira for the same with more experts with runtime expertise taking that forward.

For the sake of being optimist, @Aggarwal-Raghav lets just give it a shot if moving them up is easy and build runs else we can create another jira to get that in.

@abstractdog / @ayushtkn The goal here is to remove common transitive dependencies from tez also in hive i suppose and prevent further classpath inconsistencies ?

@ayushtkn
Copy link
Member

Moving all the dependencies will be a big refactor, we should definitely do it as part of a new jira, not in this. The only dependency we are adding here is "tez-protobuf-history-plugin", I think @abstractdog mentioned that we move that to the main pom & have the exclusions for it defined in the main pom, not the existing ones, for the existing ones we can create a followup if needed.

if for the "new" dependency being added here, if defining it & its exclusion in the main pom doesn't work, I am ok with following the old approach of independently defining it is ql & itests and the exclusions there.

@abstractdog
Copy link
Contributor

I believe we're quite close, also I admit that we don't have a clear policy regarding the tez dependencies and where to exclude hadoop stuff from there, so I believe, without refactoring the whole pom chaos, we should still do here what makes the most sense, which is I tried to suggest, so:

  1. include a tez dependency with all its exclusions in the root pom.xml's dependency management
  2. refer to it from child pom.xml where it's needed

+1 ignore detailed history now, just exclude the typical ones:

      <exclusions>
        <exclusion>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-common</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-mapreduce-client-core</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-mapreduce-client-common</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-hdfs</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-yarn-client</artifactId>
        </exclusion>
        <exclusion>
          <groupId>org.slf4j</groupId>
          <artifactId>slf4j-log4j12</artifactId>
        </exclusion>
        <exclusion>
          <groupId>commons-logging</groupId>
          <artifactId>commons-logging</artifactId>
        </exclusion>
      </exclusions>

does this make sense?

@abstractdog
Copy link
Contributor

latest patch LGTM +1

Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@abstractdog abstractdog merged commit 0bc624a into apache:master Mar 29, 2024
5 checks passed
@Aggarwal-Raghav Aggarwal-Raghav deleted the HIVE-28028 branch August 16, 2024 10:55
dengzhhu653 pushed a commit to dengzhhu653/hive that referenced this pull request Sep 23, 2024
…in HIVE-19288 (apache#5079) (Raghav Aggarwal reviewed by Laszlo Bodor, Ayush Saxena)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants