Skip to content

Cloud Cloning What's That

Kevin Normoyle edited this page Oct 2, 2013 · 2 revisions

The h2o-nodes.json is auto-generated by one of two means, during a python cloud build (for instance h2o/py/cloud.py) (used in the testdir_release jenkins runs)

You can reuse a h2o-nodes.json (no need to rebuild) if you know the cloud you built is the same each time. But you can have it rebuilt for you from the built cloud easily. (i'll send another email with example after this)

or by pointing h2o/py/find_cloud.py to a running cloud, given at a minimum a one line "flatfile" that has ip and port. It can have multiple ip and port, just like a flatfile, and it will interrogate them all. This is useful for the cases of checking whether the cloud you thought you had, didn't cloud up like it should.

The 2nd method is used when running python tests against a cloud that was built by the h2o-on-hadoop methods tomk has. (Note that, nicely, the file tom creates can be fed to find_cloud.py with the -f argument)

-h gives the hint:

 ~/h2o/py$ ./find_cloud.py -h
usage: find_cloud.py [-h] [-f FLATFILE]

Creates h2o-node.json for cloud cloning from existing cloud

optional arguments:
  -h, --help            show this help message and exit
  -f FLATFILE, --flatfile FLATFILE
                        Use this flatfile to start probes defaults to
                        pytest_flatfile-<username> which is created by python
                        tests

With the -ccj <h2o-nodes.json> method below, any python test will work (it will subvert it's cloud build into a cloud clone)

With the testdir_release python tests style of coding, which never builds a cloud, there is no need to say -ccj, if the default h2o-nodes.json exists in the directory. You can use it if you want, or if you rename the file.

The -h tells you this about -ccj. See the -ccj argument at the bottom of the help

Test: test_rf1.py    command line: python test_rf1.py -h
Python runs on: 192.168.1.80
usage: test_rf1.py [-h] [-bd] [-b] [-v] [-ip IP] [-cj CONFIG_JSON] [-dbg]
                   [-rud] [-s RANDOM_SEED] [-bf] [-slp] [-aai]
                   [-ccj CLONE_CLOUD_JSON]
                   [unittest_args [unittest_args ...]]

positional arguments:
  unittest_args

optional arguments:
  -h, --help            show this help message and exit
  -bd, --browse_disable
                        Disable any web browser stuff. Needed for batch.
                        nosetests and jenkins disable browser through other
                        means already, so don't need
  -b, --browse_json     Pops a browser to selected json equivalent urls.
                        Selective. Also keeps test alive (and H2O alive) till
                        you ctrl-c. Then should do clean exit
  -v, --verbose         increased output
  -ip IP, --ip IP       IP address to use for single host H2O with psutil
                        control
  -cj CONFIG_JSON, --config_json CONFIG_JSON
                        Use this json format file to provide multi-host
                        defaults. Overrides the default file
                        pytest_config-<username>.json. These are used only if
                        you do build_cloud_with_hosts()
  -dbg, --debugger      Launch java processes with java debug attach
                        mechanisms
  -rud, --random_udp_drop
                        Drop 20 pct. of the UDP packets at the receive side
  -s RANDOM_SEED, --random_seed RANDOM_SEED
                        initialize SEED (64-bit integer) for random generators
  -bf, --beta_features  enable or switch to beta features (import2/parse2)
  -slp, --sleep_at_tear_down
                        open browser and time.sleep(3600) at tear_down_cloud()
                        (typical test end/fail)
  -aai, --abort_after_import
                        abort the test after printing the full path to the
                        first dataset used by import_parse/import_only
  -ccj CLONE_CLOUD_JSON, --clone_cloud_json CLONE_CLOUD_JSON
                        a h2o-nodes.json file can be passed (see
                        build_cloud(create_json=True). This will create a
                        cloned set of node objects, so any test that builds a
                        cloud, can also be run on an existing cloud without
                        changing the test

An actual example

$ python test_rf_predict3_iris.py -ccj h2o-nodes.json



[ h2o-nodes.json ]
-----------------------------------
{
     "h2o_nodes": [
         {
             "sandbox_error_was_reported": "false",
             "sandbox_ignore_errors": "false",
             "http_addr": "127.0.0.1",
             "node_id": 0,
             "hdfs_name_node": "192.168.1.176",
             "use_hdfs": "true",
             "port": 54321,
             "hdfs_version": "cdh3",
             "delete_keys_at_teardown": "true",
             "redirect_import_folder_to_s3_path": "false",
             "remoteH2O": "true",
             "username": "0xcustomer",
             "java_heap_GB": 0,
             "use_maprfs": "false",
             "redirect_import_folder_to_s3n_path": "false"
         }
     ],
     "cloud_start": {
         "username": "null",
         "ip": "null",
         "cwd": "null",
         "config_json": "null",
         "time": "null",
         "python_cmd_line": "null",
         "python_test_name": "null"
     }
}

sandbox/commands.log (tests right..assume the cloud building cleaned/created sandbox

There's a detail that the sandbox needs to be created for a test. I'm adding that to find_cloud.py but you can just

 mkdir -p sandbox

whereever you are, if it complains about sandbox/commands.log, for now (you may want to delete/create to make sure you start with a clean commands.log...that's what find_cloud.py will do for you when I can push, also)

find_cloud.py example

In this case, my "found cloud" had 3 jvms on the same ip address. I warn about that, because I was interested in cases where the h2o on hadoop cloud was putting multiple jvms on the same node.

(it uses the pytest_flatfile-.json in the current dir

that has

192.168.1.80:54325
192.168.1.80:54321
192.168.1.80:54323

But you could have created it yourself and passed it with the -f argument

./find_cloud.py
Starting with contents of  pytest_flatfile-kevin
Start http://192.168.1.80:54325/Cloud.json
Added node 192.168.1.80:54321 to probes
Added node 192.168.1.80:54323 to probes
Added node 192.168.1.80:54325 to probes
Start http://192.168.1.80:54321/Cloud.json
Start http://192.168.1.80:54323/Cloud.json
Start http://192.168.1.80:54325/Cloud.json
Start http://192.168.1.80:54321/Cloud.json
Start http://192.168.1.80:54323/Cloud.json

We did 12 tries

len(probe): 6
Checking for two h2os at same ip address
..<some snipped stuff around the warning that doesn't usually happen>
WARNING: appears to be 3 h2o's at the same IP address

Writing h2o-nodes.json
Cleaning sandbox, (creating it), so tests can write to commands.log normally
Clone this wiki locally