In this tutorial, I will walk through how to leverage Amazon EC2 to train the machine learning model in an inexpensive way. With a tight budget, I use this approach to work with Kaggle competition and to experiment deep learning techniques. The tutorial is suitable for a person who uses Windows.
To save up cost, you need to make everything ready for training before launching an EC2 instance. The files you need to prepare may include, but not limited to, training data, ipython script, and a list of dependencies. So, when you launch the instance, you can jump into training with less extra works.
The reason I use Spot instance is because it is much more cheaper by 60-70%. However, unlike an on-demand instance, the spot instance does not allow you to stop and resume it when needed. Therefore, you will be constantly charged since the time you created the instance until the time you terminated it. That's why, you need to make sure everything is ready before using it.
To get start, register the AWS, and log in to your EC2 dashboard. Click "spot requests", and then "Request Spot Instances". A new configuration page will show up
EC2 dashboard
Configuration page
You will see that there are many fields in this page, but normally, I will leave most of them as they are. Except the fields belows:
-
AMI
: I choose Ubuntu 14.04 or Linux. But it is up to you to choose software configuration -
EBS volume
: I change the size to 30GB, which is in the free-tier -
Security group
: This will enable you to remote connect to this instance. If you have never created security group before, you need to create a new one:-
click
create new security group
on the right. It will direct to the new page. Then clickcreate security group
-
Add name and description. Select inbound tab, insert rules following the image below, and then click "create"
-
go back to the configuration page, and select the security group that you just created
-
-
Key pair name
: If you don't have any previous key-pair, you should create a new one following below instructions -
Maximum price
: selectSet your max price (per instance/hour)
and set the maximum price you want to pay
After finish configuration, click launch
. You will see your new instance in instance tab in EC2 dashboard
After creating the new instance, we need to setup the environment for training. I found that there is a great walkthrough in the internet already. Please follow this
Apart from installing Anaconda and jupyter, you can install other libraries such as Lightgbm and HDBscan according to your needs
# For example
conda install -c conda-forge lightgbm
pip install hdbscan
I normally use FileZilla application to transfer file between a local machine and EC2 instance. I also use Google Drive to store very large files, and download these files to the instance via Python gdown library.
-
Download the FileZilla Client, and install the program
-
Open the program and click setting according to the picture below
-
Select
SFTP
andadd key file
(You can reuse key file generated in step 1.1) and hitok
-
Click top-left icon (Open the site manager) and click
New Site
. Fill in the public DNS and user name -
Click
connect
, which will allow you to access the file in your instance
-
install
gdown
librarypip install gdown
-
open jupyter notebook and write this following code
import gdown url = 'https://drive.google.com/uc?id=[ID]' output = '[filename]' gdown.download(url, output, quiet=False)
-
you need to replace [ID] and [filename] in the code above. To obtain the ID, open the google drive, right click on the file, and select
get shareable link
Then you will get the link, which contain the ID of the document. Make sure that the file is set as Anyone with the link...
-
run the code
After setting up the environment and transferring all files to the instance, it is time for you to run the code to train the model
After training the model, I normally use pickle to store the trained model or tf.train.Saver to save session for Tensorflow
import pickle
with open("myModel.pkl", 'wb') as pickle_file:
pickle.dump(model, pickle_file)
Then I transfer the model back to my local machine through FileZilla
Finally, I will clear thing up by terminating the instance to stop charging
And that's it. It appears that there are many tasks to setup the environment. However, if you are familiar with the process, it can be very fast to setup and you can save your money up to 60-70%!!!