Training and Experiments
Once you have the project setup, running and monitoring experiments can be done with a single command.
Run an Experiment
$ roro run -size S2 python train.py starting job 5b39921e $ roro logs 5b39921e loading iris-dataset training the decision tree model accuracy is 0.96 saving the model $
The command reads the configuration file
roro.yml and automatically creates a runtime, installs required dependencies and submits training job to a remote machine. The training job runs in the background, you can now launch another experiment in parallel. Use
roro logs command to check the logs from the training run.
Machine Size and GPU
You can specifiy size of the machine you want for the run changing input to the
-size flag. For GPU, use
-size G1. Ensure that your program is setup to use a GPU, i.e. you need to ensure device flag has something like
Use the model management APIs to store the experiment meta-data and version your run, compare results of various runs.
The process for submitting a training job from CLI of your local machine is same as above.
You can easily abort a long running experiment using the
roro stop command. You can still intrpspect the logs using
roro logs <JOB_ID>
$ roro stop 5b39921e
You can see the entire history of job submissions using
roro ps -a.
Help us improve the documentation. Flag errors, issues or request how-tos, guides and tutorials on our
#documentation channel on our Slack.