Training and Experiments

Once you have the project setup, running and monitoring experiments can be done with a single command.

Run an Experiment

$ roro run -size S2 python train.py
starting job 5b39921e

$ roro logs 5b39921e
loading iris-dataset
training the decision tree model
accuracy is 0.96
saving the model

$

The command reads the configuration file roro.yml and automatically creates a runtime, installs required dependencies and submits training job to a remote machine. The training job runs in the background, you can now launch another experiment in parallel. Use roro logs command to check the logs from the training run.

Machine Size and GPU

You can specifiy size of the machine you want for the run changing input to the -size flag. For GPU, use -size G1. Ensure that your program is setup to use a GPU, i.e. you need to ensure device flag has something like /gpu:0 set.

Use the model management APIs to store the experiment meta-data and version your run, compare results of various runs.

The process for submitting a training job from CLI of your local machine is same as above.

Abort Experiment

You can easily abort a long running experiment using the roro stop command. You can still intrpspect the logs using roro logs <JOB_ID>

$ roro stop 5b39921e

You can see the entire history of job submissions using roro ps -a.

Feedback

Help us improve the documentation. Flag errors, issues or request how-tos, guides and tutorials on our #documentation channel on our Slack.