Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

HCFS August 2016 Sprint planning and walk through

Roman V Shaposhnik edited this page Aug 19, 2016 · 5 revisions

Fri 8/19 9am:

Roll call:

  • Roman Shaposhnik
  • PC from IBM
  • John Mertic
  • Matt Foley
  • Pradeep Roy
  • Susan Malaika
  • Chris Nauroth

AI:

  • Matt: to commit/send PR for his doc to the feature branch https://github.com/odpi/specs/tree/ODPI-168
  • Roman: get back to Operations working group with the feedback on managing HCFS 3d party FSes
  • PC: to share IBM wiki/docs documenting how GPFS integrates with Ambari

Meeting notes:

  • Matt presenting his HCFS installation and configuration paper for 3d party filesystems. Blob-store vs. FS is not a useful distinction, nowadays you have to really deep dive into talking about Consistency, Atomicity and Durability. For example, Azure, despite being a blob-store is providing all 3 while some of the filesystem products don't. Matt's doc is a perfect starting point for the spec section (ODPI-168). A feature branch was created into which Matt is expect to commit this doc and where everybody is expect to start working on the spec'ifying it: https://github.com/odpi/specs/tree/ODPI-168 Ambari has only been integrated with GlusterFS. We need to figure out where does the section on Ambari/HCFS belong but it seems like we need the section (ODPI-169).

  • PC described the integration with GPFS. They ran into a bunch of Ambari limitations and how the connector approach was limiting. The new connector takes HDFS client jar and implements all the RPC it needs. Thus you don't need a new stack and just have HCFS(GPFS) as a service and then you can do integrate/unintegrate. It is all documented and available on the IBM wiki.

  • Chris walked us through the HCFS doc. We focused on the latest version in the tree as opposed to a published doc on the website. Then difference is not crucial so relying on the published doc should be fine. Test cases are organic and we decided to prioritized the written tests over the written contracts. Lets figure out the language for the spec to talk about both though. See ODPI-170 for details. skipIfUnsupported can be used to skip individual features during testing. There's also JUNIT's AssumptionViolatedException. Updating the tests require one to run on variety of different implementation (HDFS, S3, Azure). This is something to keep in mind for anybody working on upstream Hadoop JIRAs. Hadoop committers in general can help with running tests in environments that are not available to original contributors.

  • Roman: how do we incorporate these tests into the ODPi certification? Does ODPi compatible means HCFS compatible or do we keep those separate. More details in ODPI-171.

  • We need to test CLI especially for the output (e.g. output of ls command on S3 is different). We need to spec out that output. More details in ODPI-172.

  • Roman showed how to, potentially, one can hook up HCFS contract tests supplied by the vendor to an ODPi Bigtop-driven testsuite. All you need to do is to provide entry points here: https://github.com/odpi/bigtop/tree/odpi-master/bigtop-tests/smoke-tests/hcfs What was used to demo it was a series of dummy classes extending existing S3 test classes: https://github.com/apache/hadoop-common/tree/branch-2/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/s3n

Published agenda:

Our current plan is to deliver the first publicly consumable artifact of the next release of ODPi by Hadoop World Strata NY (9/26-9/30). This gives us total of 6 weeks which is more like 4 weeks of development time + 2 weeks for individual member companies testing time.

I would like to invite all the parties who plan to directly contribute to HCFS specification efforts to a "Sprint Planning" so that we can lock the exact scope and start tracking it via ODPi JIRA.

At this point, here's what I suggest we try to accomplish:

  1. Have a guided walkthrough for Apache Hadoop HCFS spec and its corresponding test suite

  2. During #1 keep track of the gaps that come up in the HCFS scope and coverage when it comes to the usecases identified by OPDi member. Have all these gaps recorded as ODPi JIRAs. Note, this required a certain amount of homework to be done ahead of time by the member companies reviewing the spec and tests.

  3. Given the high-level scope outlined by Scott in his original email in this thread, come up with a skeletal section/sub-section plan for the Runtime Spec when it comes to HCFS concerns. Start tracking completion of various Subsections as ODPi JIRAs.

  4. Have a demo/POC of the existing HCFS test suite hooked up to the current ODPi test harness. Identify that gaps and start tracking them as ODPi JIRAs.

  5. Talk through missing testcases/missing implementations for custom filesystems and file ODPi JIRAs to track work on developing missing functionality.

For #1 it would be really great if somebody from the HDFS core team (Chris? Matt?) could lead the session in the style of a guided paper study group. I can provide input and demos on how to integrate these tests into ODPi testsuite as well. Please volunteer to be a guide by directly replying to this email.

Clone this wiki locally