Up in the Clouds

Johann Baydeer
Geek Culture
Published in
10 min readMar 4, 2021

--

I was recently working on a data science group project about discerning real and fake news using natural language processing. I and the other group members (Elliot and Brian) had a really large dataset, around 100 megabytes, of recent news articles. It was really interesting to have all this data on hand but we quickly realized that one of the drawbacks was the time it took to process this data. Even before the modeling process, we had to perform the tokenization, stemming, sentiment analysis, part of speech tagging and all those steps took an hour each. What a nightmare! I remember the loud noise my computer was making sorting through those rows. And it wasn’t just noisy but slow as well while I was browsing and running other programs.

I now realize the importance of hardware when it comes to data science, big data, or even software engineering. Having a computer that you can rely on is essential! The thing is we weren’t going to buy new computers for this one projector stop using ours to be able to get the pre-processing of our data done faster. This is when cloud computing comes into play! In this article, I want to demonstrate how you can set up a remote computer and perform tasks with it from your local machine. I also want to spare you the fan noise I had to endure, I know you will thank me later!
There are many cloud service providers and the main ones are Microsoft Azure, IBM Cloud, Google Cloud, and Amazon Web Services also known as AWS. Prices may vary across companies but I want to show you how you can set up an AWS account and get an instance running without spending a dime using their free tier option.

1.Create an AWS account

The first step is pretty easy! We need to go to the AWS address: https://aws.amazon.com/ and create an account:

Next, you will be prompted to enter your email address, choose a password and an account name:

The second step of the registration is pretty similar and you need to provide your name, address and, phone number. You also need to specify the intended usage of the account whether it is for commercial purposes or personal projects.

The third step might scare you off a little bit, I was at first but, you need to enter your credit card information for verification purposes. Again, I aim to demonstrate the free tier option so there won’t be any charges if you follow my directions:

Almost done! The fourth step is a basic phone number verification like setting up a Uber account. You enter your phone number and receive a code that you need to write on the AWS website and enter a captcha:

Last step of the registration I promise !! This is where you have to choose the type of account you want to get. You have the choice between the basic account, the developer support account, and the business support account; we will go with the basic option:

We now have an AWS-registered account. Once you are logged in, you have access to the AWS management console listing all the numerous services available. We are interested in launching a virtual machine so the option to select here is EC2:

2. Setting up the Security Group

Once we selected the EC2 option, we land on the EC2 dashboard where we can configure the security and launch an instance. I have to emphasize how important security is when it comes to cloud computing, you don’t want to leave your data accessible to everybody especially when you are working for a company. Before you even start, you must create a security group that acts as a firewall to control incoming and outgoing traffic from your server. After you selected the EC2 service, the security group option will be on the right tab:

There is a default security group but you want to create your own to assign permissions if there will be multiple people connecting to this server and also receive the key pair file which I will talk about in a second:

You can see in this picture above that I already created a security group before called “clockwork-orange” and yes that was a Stanley Kubrick reference. There is also the default security group at the bottom but let’s create a new one so I can walk you through the process. Once you clicked on the create security group button, you will be prompted to enter a name for your security group and enter a description as well. This is also the step where you will configure the inbound and outbound rules or in other words who has access to your server. It is generally assigned to an SSH or IP address so each person that would have access to your virtual computer would be added here with their IP address. We will leave it open to anyone for this demonstration but that is a key step when you consider only adding people that you work with for example. So first we’ll need to add an inbound rule, select the SSH parameter and set it to anywhere:

We just gave ourselves access to the instance by adding the SSH. Next is the outbound rule which is the outgoing traffic from the instance. In this case, we’ll set it up to anywhere but in a work environment, you would also specify the SSH or IP address of the different people working with you. The next step is to click on the create the security group button at the bottom of the page:

3.Create a Key-Pair

That’s it! We have created our security group! Now we need to create a key-pair which is a set of public and private keys that will be used to connect to your instance. Once you create a key-pair, you will receive a .pem file that will be used to authenticate yourself, almost like a password. Let’s create a key-pair before we launch an instance:

The same process applies here, you click on the tab then “ Create key pair “ and then assign it a name:

I already created a key-pair before and you can see it in the picture above but we are creating a new one. The next step is to assign a name to that key-pair and choose the .pem format to use with SSH:

Well, it seems like Amazon only speaks one language! But let’s not be offended. We can still go along the process. Once you click on the create key-pair button, a download will start and you will receive the key as a .pem file. It is very important to remember where you store it if you move it from your download folder since you will need the path to connect to your instance.

4. Instance configuration

I moved my .pem file to a directory called projects in the library. If you are like me, you might be feeling that this is a kinda long set up but let’s start the exciting part. It’s time to launch an instance! This will be done in several steps, the first one will be to go to the Instances tab, click on the launch instances button and choose an operating system:

Next is the part where we choose the operating system. Ubuntu, a Linux-based operating system, is the most common choice. The only advice I got from my instructors was to never choose windows; go figure!

There are many configurations available for our instance regarding the CPU, RAM, or storage. Your choice will depend on what your project is and the size of your organization but prices vary accordingly. I will stick with some defaults setting since those are free and select configure instance details:

In the next step, you can choose several instances that will run in parallel and have many computers working on a task. How cool is that? Let’s stick with 1 for demonstration purposes:

The next part is to add storage. Once again I will stick to the default settings but many options are available. We can’t select review and launch right away because we want to get to the configure security group part:

Free tiers customers can get up to 30 GB of storage. That’s interesting! Next is the Add Tags part but we can just skip it since it is optional and get to the configure security group part where we want to select an existing security group that we created above (yes up there!):

This is the time to review and launch our instance. Amazon will give us a warning concerning our security settings since we allowed any IP address to connect (Just for demonstration!). We can just go ahead and launch our instance for now:

We will be asked to choose the key that we downloaded earlier and to acknowledge that we have it to connect to our instance :

Let’s congratulate ourselves !! We now got an instance running, our remote computer is up!

5. Connect to our instance using Terminal

Now that we created and launched an instance, we will need to connect to it from our terminal. Before we connect to our instance, we need to add a layer of security since our key is readable to everybody that access our computer. For this reason, we’ll need to “chmod” it and only give us, the administrator, permissions to read and write this file (following instructions applies to mac only):

The command here is sudo chmod 700 and the path of the key.pem file, it simply gives the admin read and write access and no one else. You will be asked to enter your password and voila! We added some needed extra security. The next part is to connect to our instance. First, we need to go back to amazon and click on our instance to copy the public IPv4 address that will be required for the connection:

Now we are going back to the terminal, the next command is ssh -i [path to your key file] ubuntu@[paste your copied ipv4 here] and press enter; then type yes after the security prompt:

And that’s it! We are connected to our remote computer! We can do the print working directory command to confirm:

The working directory is no longer in our local machine! From there the sky is the limit! We can install python, anaconda, and all the packages and libraries required for your project. I think that article will be the introduction for a second article so this one won’t be ridiculously long! For now, it is important to remember to terminate the instance since it’s not free in most situations so the best practice is to terminate it once you are done with your task:

Your terminal will now revert to your local machine!

That was a long process but when you did it once, it becomes very simple. I want to continue on this subject and demonstrate how to run a python script and a model remotely in the following article. For now, I hope it will be helpful!

--

--

Johann Baydeer
Geek Culture
0 Followers

immersive data science bootcamp @ General Assembly