Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOSGCP-83 and DAOSGCP-84 Automatic format storage with pool and container creation that supports ACLs #40

Merged
merged 10 commits into from
May 13, 2022

Conversation

lsitkiew
Copy link
Contributor

@lsitkiew lsitkiew commented May 9, 2022

No description provided.

Copy link
Collaborator

@cboneti cboneti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you still need to provide a means by which clients who do not have a daos image can mount daos.

In a classic NFS world. You would need two outputs: install-nfs (installing nfs-common drivers) and (mount).
In DAOS, you would need to:

  1. install DAOS rpms
  2. copy all the yamls
  3. launch daos services

Then later, mount daos using dfuse.

terraform/modules/daos_server/main.tf Outdated Show resolved Hide resolved
Copy link
Contributor

@Kmannth Kmannth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lsitkiew There are a few extra changes in this ticket for 84.

  1. Please put changes to pool types into an 83 workstreams. 83 will being in a full complete data type with all the things. Better to get this landed and build 83 work on top.
  2. I am not sure about the move to put the daso_agent starting in the client startup script. I think there is logic today that calls as is today in some use cases. Can we revert this change for now? or ????
  3. Io500 autopool seems like a good idea. Can we wait for 83 to fully roll out all the syntax default pools need to go in the examples and io500 and keep this pull request just to the functionality? There are default policy to discuss.
  4. Please put io500 server instances changes in their own ticket so they can be reviewed in context.

This patch looks close thanks for the quick rebase.

terraform/examples/daos_cluster/variables.tf Outdated Show resolved Hide resolved
terraform/examples/io500/config/config.sh Outdated Show resolved Hide resolved
terraform/examples/io500/config/config.sh Outdated Show resolved Hide resolved
terraform/examples/io500/config/config.sh Outdated Show resolved Hide resolved
terraform/modules/daos_server/variables.tf Outdated Show resolved Hide resolved
@lsitkiew lsitkiew marked this pull request as draft May 10, 2022 19:56
@lsitkiew
Copy link
Contributor Author

We decided to combine changes from DAOSGCP-83 and DAOSGCP-84 into this single PR.

@lsitkiew lsitkiew changed the title DAOSGCP-84 Automatic format storage with pool and container creation DAOSGCP-83 and DAOSGCP-84 Automatic format storage with pool and container creation that supports ACLs May 10, 2022
…ainer creation that supports ACLs

Signed-off-by: Łukasz Sitkiewicz <lukasz.sitkiewicz@intel.com>
@lsitkiew lsitkiew marked this pull request as ready for review May 12, 2022 14:39
@lsitkiew lsitkiew requested review from cboneti, Kmannth and markaolson and removed request for markaolson May 12, 2022 14:39
@lsitkiew
Copy link
Contributor Author

I rebased changes on top of current develop branch.

Signed-off-by: Łukasz Sitkiewicz <lukasz.sitkiewicz@intel.com>
Signed-off-by: Łukasz Sitkiewicz <lukasz.sitkiewicz@intel.com>
Copy link
Contributor

@Kmannth Kmannth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are 2 major issue:

  1. Showing "reclaim:disabled" to any user is a no-go for me.
    This is a DEV only feature of DAOS.
  2. There new pool type is in 3? spots. Lets find a way to have a single pool type.
  3. There is a new script called pool_cont_create but it is also "Formats" daos.

Minor issue:
After reading the patch I don't know how to use the ACL feature. I think I add my user name in some spots but I will have to read the DAOS manual.

There are also sever another things outside of 84/83 in this patch but lets just leave them here for now...

# tier_ratio = 3
# acls = [
# "A::OWNER@:rwdtTaAo",
# "A:G:GROUP@:rwtT"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to show a actual user name in this place. It is not clear to me where to add my user name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment with example would be better?

terraform/examples/daos_cluster/variables.tf Outdated Show resolved Hide resolved
@@ -38,4 +38,6 @@ systemctl start daos_server
systemctl enable daos_agent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine I will retest. My testing need a few seconds of sleep on a single server for DMG commands to not error out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please put the 5 seconds sleep back. dmg cannot talk for a few seconds and the logic of startup gets messed up.

May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: Created symlink from /etc/systemd/system/multi-user.target.wants/daos_agent.service to /u
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + systemctl start daos_agent
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + set -x
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + echo 'BEGIN: DAOS server format'
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: BEGIN: DAOS server format
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + [[ daos-server-0001 == \d\a\o\s-\s\e\r\v\e\r-\0\0\0\1 ]]
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + dmg network scan
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + grep --fixed-strings daos-server-0001
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: ERROR: dmg: 1 host had errors
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: daos-server-0001 the server at daos-server-0001:10001 refused the connection
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + echo 'All DAOS Servers started'
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: All DAOS Servers started
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + echo 'Formatting storage on servers: daos-server-0001'
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: Formatting storage on servers: daos-server-0001
May 13 06:56:04 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + dmg storage format
May 13 06:56:05 daos-server-0001 google_metadata_script_runner[1371]: startup-script: Format Summary:
May 13 06:56:05 daos-server-0001 google_metadata_script_runner[1371]: startup-script: Hosts SCM Devices NVMe Devices
May 13 06:56:05 daos-server-0001 google_metadata_script_runner[1371]: startup-script: ----- ----------- ------------
May 13 06:56:05 daos-server-0001 google_metadata_script_runner[1371]: startup-script: daos-server-0001 1 2
May 13 06:56:05 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + dmg system query -v
May 13 06:56:37 daos-server-0001 google_metadata_script_runner[1371]: startup-script: Rank UUID Control Address Fault Domain State Reas
May 13 06:56:37 daos-server-0001 google_metadata_script_runner[1371]: startup-script: ---- ---- --------------- ------------ ----- ----
May 13 06:56:37 daos-server-0001 google_metadata_script_runner[1371]: startup-script: 0 d1745607-575d-4295-b4e8-37d60e60b11e 10.128.0.54:10001 /daos-server-0001 Joined
May 13 06:56:37 daos-server-0001 google_metadata_script_runner[1371]: startup-script:
May 13 06:56:37 daos-server-0001 google_metadata_script_runner[1371]: startup-script: + echo 'Done formating DAOS server'

terraform/modules/daos_server/variables.tf Show resolved Hide resolved
Kmannth
Kmannth previously approved these changes May 12, 2022
Signed-off-by: Mark A. Olson <mark.a.olson@intel.com>
Mark A. Olson and others added 3 commits May 13, 2022 00:41
Signed-off-by: Mark A. Olson <mark.a.olson@intel.com>
Signed-off-by: Łukasz Sitkiewicz <lukasz.sitkiewicz@intel.com>
Signed-off-by: Łukasz Sitkiewicz <lukasz.sitkiewicz@intel.com>
Signed-off-by: Łukasz Sitkiewicz <lukasz.sitkiewicz@intel.com>
Signed-off-by: Łukasz Sitkiewicz <lukasz.sitkiewicz@intel.com>
@lsitkiew lsitkiew requested a review from Kmannth May 13, 2022 14:20
markaolson
markaolson previously approved these changes May 13, 2022
Copy link
Contributor

@markaolson markaolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After testing with and without the HPC Toolkit this PR looks good to me.

Signed-off-by: Mark A. Olson <mark.a.olson@intel.com>
Copy link
Contributor

@Kmannth Kmannth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please land this. It was testing well for me yesterday.

@markaolson markaolson self-requested a review May 13, 2022 16:08
Copy link
Contributor

@markaolson markaolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has been tested with and without the HPC Toolkit. It's working as intended. Excellent work @lsitkiew

@markaolson markaolson merged commit bbce926 into daos-stack:develop May 13, 2022
@lsitkiew lsitkiew deleted the DAOSGCP-84 branch May 13, 2022 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants