Deploying Shiny Apps or Interactice Documents with Shiny Server and AWS

Wednesday, 19 Jan 2022 38 min read R, AWS

Step 1: AWS EC2
Step 2: Connecting to AWS EC2
Step 3: Installing R, Rstudio Server, and Shiny Server
Step 4: Rstudio Server and IDE
- User Login
- User Library
Step 5: Shiny Server
- Configure Shiny Server
- Reverse Proxy
Step 6: Deployment
Step 7 (Recommended): Domain Name
Step 8 (Highly Recommended): Secure With HTTPS
Step 9 (Optional): Password Protect
- Configure Nginx
- User Login
Resources

Recently, I needed to deploy a flexdashboard that I had built. For shiny applications or interactive documents, there are a few options for deployment and hosting, including shinyapp.io, RStudio Connect, and Shiny Server. The available resource that I had was our AWS EC2 instance, and so I decided to host our flexdashboard through shiny-server and AWS EC2. In short, I want to document the steps that I took and the resources that have helped me better understand the deployment process in this post for future reference. This post will be updated as my understanding of AWS, shiny, and application deployment improves over time. A note is that I am a MAC user, which means I’ll be using the Terminal app initially for the EC2 set up.

Some resources that have helped me understand AWS and EC2 are the following:

Step 1: AWS EC2

If you are working for an organization that uses AWS EC2, the chances are that your data team or IT department may already have an EC2 instance running. In that case, consult your cloud manager or supervisor or whoever manages your organization’s AWS account regarding the following:

The root user could create your IAM user account, which gives you certain access rights.
You would need to connect to EC2 via a Secure Shell (SSH) using a Command Line Interface (CLI), and so you need to obtain the AWS EC2 .pem private key file.
You may also want to obtain the SSH commands that allows you to SSH into your organization’s EC2 instance.

The rest of the setup steps may differ quite a bit depending on whether or not you are using your organization’s EC2 instance or running your own. For the purpose of this post, however, we will create our own personal AWS account and EC2 instance. I find that practicing deploying an application using my own AWS account and EC2 instance has helped me ultimately set up the production environment on my organization’s EC2 instance. The first step, though, is to register for an AWS account, which is free of charge.

Launch an EC2 Instance and Select an AMI

Launch an EC2 instance by selecting an Amazon AMI.

Because many tutorials and resources online are based on Ubuntu, we will use the Ubuntu AMI. Not only does this option have the free tier option but, based on my experience, it could also save us a lot of pain in having to deal with system requirements later on. The Ubuntu 20.04.1 LTS (at the time of writing this post) is a well documented operating system with a large user base and so trouble shooting is relatively easier in my experience compared to an AMI such as the Amazon Linux AMI 1, which is based on Red Hat Enterprise Linux (RHEL). If you work for an organization, you may not be able to choose which AMI to use. But the steps that follow should work with other AMI (for instance, we use the Amazon Linux AMI at our organization), but note that you may run into problems installing the required system libraries and packages needed for deployment and even for R packages as some of the commands will be different.

Choose an Instance Type

We will choose t2.micro, which is free tier eligible. Depending on your needs for computing resources (for instance, installing R packages with complied code), you may run out of memory with 1 GiB of Memory and 1 vCPUs, so you could also consider other instance types. I recommend reading the following article to better understand the differences between instance types.

Configure Instance Details

We could leave this as default.

Add Storage

The default EBS volume size is 8 GB but we get up to 30 GB of General Purpose SSD via the free tier. See the documentation on EBS volume options.

Add Tags

Tags may be useful for organizing your AWS services. See the documentation for more on this.

Configure Security Group

Security groups function as virtual firewalls for your EC2 instances to control inbound and outbound traffic. By default, AWS blocks traffic from all ports except for port 22, which is the port we use to SSH into our instance. I use the following configuration based on mgritts’s article.

Type	Protocol	Port Range	Source	Description
SSH	TCP	22	Anywhere: 0.0.0.0/0, ::0	SSH
HTTP	TCP	80	Anywhere: 0.0.0.0/0, ::0	Use nginx to password protect and set up proxy
Custom TCP	TCP	3838	Anywhere: 0.0.0.0/0, ::0	Default Shiny server
Custom TCP	TCP	8787	Anywhere: 0.0.0.0/0, ::0	Default R Studio server

Since our instance is utilized as a web server, we use security rules to allow IP addresses to access our instance using HTTP or Custom TCP so that external users can browse the content on our web server.

The second rule allows for inbound HTTP access from all IPv4 and IPv6 addresses.
The third and forth allow for displaying web data based port numbers.

Key Pair

The last step for setting up an EC2 instance is creating your .pem private key file, or select to use an existing key file provided by your organization.

Finally, launch your instance.

Elastic IP

An elastic IP address is different than our EC2 instance’s Public IPv4 address; in short, an Elastic IP address is allocated to our AWS account, and is ours until we release it. Therefore, this IP address can be reused for our EC2 instances. The re-usability of our IP may be useful when we want to upgrade or downgrade our EC2 instance type. Without an elastic IP address, a new Public IPv4 address will be used each time we stop and re-launch our instance. This means that any service that depends on our public IP will need to be updated. The benefit of an elastic IP address is that we can simply associate it to the new server. In other words, the elastic IP address allows us to mask the failure of an instance or software by rapidly remapping the address to a new instance in our account. The setup is as follows:

Select the Action drop down menu in the top right corner and choose Associate Elastic IP address. From now on, every time we make changes to our EC2 instance, we can simply re-associate this IP address to our new instance.

Step 2: Connecting to AWS EC2

Connecting via SSH

To connect to our EC2 instance via SSH, we will use the terminal (for windows, the steps for PuTTY can be found here). When you select “Connect” in your AWS console, you should be taken to the following page:

Open the terminal, navigate to the location of our .pem key:

# Change working directory 
$ cd path_to_pem_file

Next, run the following command to ensure that our key is not publicly viewable:

$ chmod 400 file.pem

Connect to the instance:

$ ssh -i "file.pem" ubuntu@ec2-public-ip-address.compute-1.amazonaws.com

If this is your first time connecting to your EC2 instance, you may receive an Are you sure you want to continue connecting (yes/no/[fingerprint])? prompt. Entering yes should successfully connect you to you EC2 instance:

Welcome to Ubuntu 20.04.3 LTS (GNU/Linux 5.11.0-1022-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Sat Jan 29 01:27:58 UTC 2022

  System load:  0.0               Processes:             100
  Usage of /:   4.9% of 29.02GB   Users logged in:       0
  Memory usage: 21%               IPv4 address for eth0: 172.31.91.243
  Swap usage:   0%


1 update can be applied immediately.
To see these additional updates run: apt list --upgradable

Disconnecting

To disconnect from our instance:

$ exit

Upgrading and Installing System Packages

This particular step and many of the steps that follow are places where the operating system will begin to matter. The following commands are meant to work with the Ubuntu OS (a Debian-based Linux distribution). For instance, the Advanced package tool, or APT, used to handle the installation and removal of software is developed for Ubuntu/Debian system software packages. For Red Hat-based Linux systems, the Yellowdog Updater, Modified, or YUM, package-management utility is used. In addition, R packages can depend on software external to the R ecosystem. On Ubuntu, for instance, in order to install the curl R package, we must install the system library first via apt-get install libcurl. Resolving system dependency issues can be painful at times, and the pain points may vary based on the operating system (AMI). One effective way for troubleshooting based on my own experience is simply Google searching for system dependencies on an ad-hoc basis (after an error is thrown, for instance, when your try to install an R package). If your are lucky, you won’t be the first to crash because of a missing system library.

# Update commands
$ sudo apt update
$ sudo apt-get update -y
$ sudo apt-get dist-upgrade -y
# Install some system libraries
$ sudo apt-get -y install \
    nginx \
    gdebi-core \
    apache2-utils \
    pandoc \
    pandoc-citeproc \
    libssl-dev \
    libcurl4-gnutls-dev \
    libcairo2-dev \
    libgsl0-dev \
    libgdal-dev \
    libgeos-dev \
    libproj-dev \
    libxml2-dev \
    libxt-dev \
    libv8-dev \
    libhdf5-dev \
    git

The difference between apt-get and apt is that the former is an older command with more options while apt is a newer, more user-friendly command with fewer options. To understand these shell commands, I found explainshell.com (its github repo can be found here) to be extremely useful. Other resources that are also helpful include:

FreeBSD Manual Pages
Chapter 7 of Matt Dancho’s book

To be able to compile R packages, we also need to install the build-essentials packages that are necessary for compiling software:

$ sudo apt install build-essential

On Ubuntu, you may run the following command to check on disk space:

$ df -h

If nginx is installed successfully, you should see the following page by entering your Public IPv4 address (obtained from Instance summary in your AWS console) into your web browser:

Step 3: Installing R, Rstudio Server, and Shiny Server

Installing R from CRAN

Because R updates frequently, the latest stable version isn’t always available from Ubuntu’s default repositories, and so we’ll need to add the external repository maintained by CRAN. To install the latest version of R from CRAN, the commands are as follows:

# Update indices
$ sudo apt update -qq
# Install two helper packages 
$ sudo apt install --no-install-recommends software-properties-common dirmngr
# Add the signing key (by Michael Rutter) for these repositories
# To verify key, run gpg --show-keys /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc 
# Fingerprint: 298A3A825C0D65DFD57CBB651716619E084DAB9
wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | sudo tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
# Add the R 4.0 repo from CRAN -- adjust 'focal' to 'groovy' or 'bionic' as needed
sudo add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"

The instructions for installing R on other operating systems can be found here under “Download and Install R”. Finally, run the following command to install R:

# Install recommended packages
$ sudo apt install r-base

Or, install without considering recommended packages:

# Install without recommended packages
$ sudo apt install --no-install-recommends r-base

To check the R version:

$ R --version

Other useful commands are:

# Run R from the terminal
$ R 
# Quit
$ q()

If you are using an operating system that comes with another AMI, the stable version of R its the default repository may be different. For Amazon Linux AMI, the latest version of R is 3.x; depending on the R packages you need to install, this may or may not be troublesome.

Installing Rstudio Server

To download the latest version of Rstudio server, use the following link and select your linux platform. The official instructions are easy to follow, use the following commands to install Rstudio server for Ubuntu 20 (at the time of writing this post):

$ sudo apt-get install gdebi-core
$ wget https://download2.rstudio.org/server/bionic/amd64/rstudio-server-2021.09.2-382-amd64.deb
$ sudo gdebi rstudio-server-2021.09.2-382-amd64.deb

Installing Shiny Server

Similarly, install the Shiny R package and the latest version of Shiny Server by following the instructions on this page.

# Install shiny
# This may take a while to compile on tc2.micro
$ sudo su - \
-c "R -e \"install.packages('shiny', repos='https://cran.rstudio.com/')\""
# Install shiny server
$ sudo apt-get install gdebi-core
$ wget https://download3.rstudio.org/ubuntu-14.04/x86_64/shiny-server-1.5.17.973-amd64.deb
$ sudo gdebi shiny-server-1.5.17.973-amd64.deb

Checking installed

On Ubuntu, run the following commands:

$ cd ~
$ ls

You should see that both Rstudio server and Shiny server are installed. On RedHat-based Linux distributions, you might use the following commands to check if both servers are properly installed:

# List installed packages
$ sudo yum list installed
# Use grep command to filter for specific package
$ sudo yum list installed | grep nginx

Other resources that I have found useful are as follows:

Bash cheat sheet by Julien Le Coupanec
Linux and Ubuntu Terminal cheat sheet
YUM commands cheat sheet

If you enter http://<public-ipv4>:8787 and http://<public-ipv4>:3838 into your browser, you should see the following pages:

And

Install R packages (System Library)

One way to install R packages is to install them in a system-level or global library; this library is available for all users and roles of your EC2 instance. The syntax for installing R packages from CRAN within the terminal is as follows:

$ sudo su - -c "R -e \"install.packages(c('tidyverse', 'data.table'), repos='http://cran.rstudio.com/')\""

To install developmental versions of R packages from github:

$ sudo su - -c "R -e \"install.packages('devtools', repos='http://cran.rstudio.com/')\""
$ sudo su - -c "R -e \"devtools::install_github('tidyverse/ggplot2')\""

Note: As mentioned earlier, with the t2.micro instance type, you may simply run out of memory installing certain R packages with compiled code (for example, Rcpp and RcppArmadillo). If this happens, it would appear that the installation process has been stuck in a never-ending process.

Install R packages (User Library)

Alternatively, there is also the option to install add-on R packages (those that do not come with base R) in a user-level library, which may be appealing for many reasons. We will return to this once we set up the user credentials for Rstudio server.

Step 4: Rstudio Server and IDE

User Login

The RStudio Server enables you to provide a browser based interface (the RStudio IDE) to a version of R running on a remote Linux server. The RStudio IDE can be accessed by entering http://<public-ipv4>:8787 into your browser. The log in credentials use the user information on your EC2 instance, which is stored in the /etc/passwd file. This file stores essential information about the users on the system. We can manage users on our EC2 instance using the following commands for linux:

In Ubuntu, there are two command-line tools that you can use to create a new user account: useradd and adduser. The former, useradd, is a low-level utility and adduser is a script written in Perl that acts as a friendly interactive frontend for useradd:

$ sudo adduser username

The command above will prompt you to enter the following information to set up the user:

Adding user `username' ...
Adding new group `username' (1001) ...
Adding new user `username' (1001) with group `username' ...
Creating home directory `/home/username' ...
Copying files from `/etc/skel' ...
New password: 
Retype new password: 
passwd: password updated successfully
Changing the user information for username
Enter the new value, or press ENTER for the default
	Full Name []: Your Name
	Room Number []: 
	Work Phone []: 
	Home Phone []: 
	Other []: 
Is the information correct? [Y/n] y

This will create the new user’s home directory, and copy files from /etc/skel to this directory. Within the home directory, the user can write, edit, and delete files and directories. To allow this user to be able to perform administrative tasks, add this existing user to the sudo group using usermod:

$ sudo usermod -a -G sudo username

Always use the -a (append) option when adding a user to a new group. If you omit the -a option, the user will be removed from any groups not listed after the -G option. On success, the usermod command does not display any output, but warns you if the user or group doesn’t exist.

In Ubuntu, you can use two commands to delete a user account: userdel and its interactive frontend deluser:

$ sudo deluser username

To delete the user and its home directory and mail spool, use the --remove-home flag:

$ sudo deluser --remove-home username

Note that sometimes you may need to kill an R session before removing the user:

# Kill an individual session
$ sudo rstudio-server kill-session <pid>
# Force kill all running sessions
$ sudo rstudio-server kill-all

The session process ID can be obtained with the following base R function:

Sys.getpid()

To change password for a user:

$ sudo passwd username

To remove a password and set up a new password upon deletion:

$ sudo passwd -d username

To see list of all users, simply use the following commands:

# List users
$ cat /etc/passwd
$ cut -d: -f1 /etc/passwd
# Search for username using the grep command
$ grep username /etc/passwd
# Or
$ grep -w '^username' /etc/passwd

To see details about the file:

$ stat /etc/passwd

More information can be found via the Rstudio Server administration guide. Finally, logging into your Rstudio IDE should take you to the following GUI:

Some useful commands for managing rstudio-server:

$ sudo rstudio-server stop
$ sudo rstudio-server start
$ sudo rstudio-server restart

User Library

Once you have set up a user on your EC2 instance, which creates a home directory for the username, your user-level library will be set up as well, so there is nothing extra to do here. Login to your rstudio IDE and run the following function:

.libPaths()

This would return the following paths on Ubuntu:

[1] "/home/username/R/x86_64-pc-linux-gnu-library/4.1"
[2] "/usr/local/lib/R/site-library"                  
[3] "/usr/lib/R/site-library"                        
[4] "/usr/lib/R/library"

On RedHat (for instance, the Amazon Linux AMI 1), the output may be something like:

[1] "/home/username/R/x86_64-redhat-linux-gnu-library/3.4"
[2] "/usr/lib64/R/library"                                    
[3] "/usr/share/R/library"

The first path is always your user library, which means that running install.package() using the Rstudio IDE will install the packages in that path. On Debian and Ubuntu, the R_LIBS_USER environment variable is set in /etc/R/Renviron.

R_LIBS_USER=${R_LIBS_USER-'~/R/$platform-library/R-version'}

where $platform is something like ‘x86_64-pc-linux-gnu-library’ and is dependent on the version of R installed on your EC2 instance. The environment variable R_LIBS_SITE is set in /etc/R/Renviron to

R_LIBS_SITE=${R_LIBS_SITE-'/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library'}

We can access the environment variables via:

$ sudo nano /etc/R/Renviron

The R packages part of r-base and r-recommended are installed into the directory /usr/lib/R/library. The other R packages available as precompiled Debian packages r-cran-* and r-bioc-* are installed into /usr/lib/R/site-library. More information for Debian packages of R software can be found in the following article. For other operating systems, the location of these start-up files may be different. But the configuration files can be edited directly in the IDE:

# Install usethis
install.packages("usethis")
# Open configuration files
usethis::edit_r_environ()

Step 5: Shiny Server

The best resource available for Shiny server is the administrative guide, which covers the most important information from system requirements to server management to hosting models to security.

Configure Shiny Server

Important: The first thing, though, is to stop the server:

# Ubuntu
$ sudo systemctl stop shiny-server
# Redhat
$ sudo stop shiny-server

Other useful commands include:

# Ubuntu
$ sudo systemctl start shiny-server
$ sudo systemctl status shiny-server
$ sudo systemctl restart shiny-server
# Redhat
$ sudo start shiny-server
$ sudo status shiny-server
$ sudo restart shiny-server

To configure Shiny server, we need to modify the default configuration file located at /etc/shiny-server/shiny-server.conf using GNU nano.

$ sudo nano /etc/shiny-server/shiny-server.conf

This should open the default configuration file:

This configuration expects that your Shiny applications are located in the following path /srv/shiny-server/. For other hosting models, please see the following section of the administrative guide. There is one sample application in the path /srv/shiny-server/sample-apps/hello/. By default, Shiny Server listens (receives information) on port 3838, so the example application will be available at http://<server-address>:3838/sample-apps/hello/. I added the following directives to the configuration file (the list of all the directives that are supported in Shiny Server config files can be found here in section “7.2 Configuration Settings”):

I added the run_as directive followed by my username. For one, the paths in which R will look for packages (.libPaths()) are often user-dependent. Since the packages required to run a Shiny application are installed in my user-level library, I must run the application as the correct user. For locations configured with site_dir, the run_as setting will be used to determine which user should spawn the R Shiny processes. This setting can be configured globally, or for a particular server or location.
I added sanitize_errors off to report error on the client. This is optional, since you could always check the log files located in the path /var/log/shiny-server using the Less command.

$ cd /var/log/shiny-server
$ sudo less [file_name].log

I changed directory_index to off to disable the directoryIndex page when user visits the base URL— http://<public-ipv4>:3838.

Reverse Proxy

A reverse proxy is the application that sits in front of back-end applications and forwards client requests to those applications. An analogy that helped me understand this better is to think of the server as a house that has many doors, which are called ports. We are limited to listening TCP ports. If we wish to obtain some information in the house (that is, the server), we must pass one of these ports to retrieve the information from the information provider (a specific process, application or a service) associated with that port.

Shiny server is one of such information providers, and it is located at port 3838. In order to reach Shiny server, we must specify the port number in the URL we enter into the browser— http://<server-address>:3838. If we do not specific the port number or if we specified the incorrect number, we will not be able to obtain data stream from Shiny server. The reserve proxy functions like a doorman at the main entrance of the house that brings us to the right information provider without having to specify which door to pass through. It directs client requests to the appropriate back-end server.

In other words, we simply need to type http://<server-address> or http://<server-address>/* (* means any sub-domain) into our browser to speak to the reverse proxy, which fetches the right information for us directly. In speaking to the reverse proxy, we do not have to specify the port number since we will have already configured the proxy to know exactly which port we want to reach. For this task, we will use nginx to set up the reverse proxy, which should already be installed on your EC2 instance.

To configure nginx, we first need to stop the service.

$ sudo service nginx stop
# Other useful commands
$ sudo service nginx start
$ sudo service nginx status

Next, we need to navigate to the directory where nginx is installed:

$ cd /etc/nginx
$ ls

The results of ls may differ, sometimes substantially, depending on the AMI (and the operating system) that we are using. For instance, on Ubuntu, the default installation of nginx might create a sites-avalable and a sites-enabled directory. On RedHat/CentOS/Fedora, the default installation of nginx does not include such directories. For those operating systems, the default place to store the configuration files is the following directory /etc/nginx/conf.d/*.conf. In addition to that, in the /etc/nginx/nginx.conf configuration file, we must ensure that the include /etc/nginx/conf.d/*.conf; directive is added in the http block to tell nginx to pull in any files in the `/etc/nginx/conf.d directory that has the extension .conf. On Ubuntu, the following set up steps are needed:

Having navigated to the /etc/nginx directory, we should see at least the following sub-directories (if not, we can create them):

conf.d sites-enabled nginx.conf  sites-available

Navigate to the sites-available directory and create a new configuration file specifically for Shiny server:

$ cd sites-available
$ sudo nano shiny.conf

Write the following block of directives in the shiny.conf file:

server {
    # Listen on 80 port
    listen 80;
    # For IPv6 addresses
    listen [::]:80;
    # The reverse proxy
    location / {
        proxy_pass http://127.0.0.1:3838/;
        proxy_redirect http://127.0.0.1:3838/ $scheme://$host/;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_read_timeout 20d;
        proxy_buffering off;
    }
}

When nginx proxies a request, it 1) sends the request (i.e., a client trying to access our shiny application or interactive document hosted on our EC2 instance) to a specified proxied server, 2) fetches the response, and 3) sends it back to the client. To understand the configuration above:

The proxy_pass directive passes all requests processed in location / to the proxied server at the specified address http://127.0.0.1:3838/. See more details on this here. Note also that the “/” prefix is used for matching requests. The location block above provides the shortest prefix (length one), and only if all other location blocks fail to provide a match will this block be used. Since we do not have any other location blocks at the moment, this one will be used.
The proxy_redirect directive does something like URL rewrite, replacing http://127.0.0.1:3838/ with variables $scheme://$host/. You can read more details on proxy_redirect here and on the components of a URL here. The full list of nginx variables can be found here.
The proxy_http_version directive sets the HTTP protocol version for proxying. By default, version 1.0 is used. More details here.
The two proxy_set_header field value directives have something to do with WebSocket proxying. These headers have to be passed explicitly so that the proxied server can know the client’s intention to switch a protocol to WebSocket.
By default, the web socket connection will be closed if the proxied server does not transmit any data within 60 seconds. This timeout can be increased with the proxy_read_timeout directive. Set this to 20 days. The configuration measure units can be found here.
The final directive proxy_buffering turns response buffering off. Disabling response buffering is necessary for applications that need immediate access to the data stream according to the following article on nginx performance.

Next, we need to create a shortcut (symbolic link) inside the sites-enabled directory. The reason is that nginx does not look at sites-available but only the sites-enabled directory in the /etc/nginx/nginx.conf configuration file. We create the .conf files inside sites-available and create a shortcut inside sites-enabled to access it. One benefit of this is that, to temporarily deactivate your access to Shiny, you only have to delete the shortcut but not the actual configuration file in sites-available:

$ cd /etc/nginx/sites-enabled
# Use absolute path
$ sudo ln -s /etc/nginx/sites-available/shiny.conf /etc/nginx/sites-enabled/
# To remove a symbolic link
$ sudo rm your-site-config

Finally, we need to add the following block to the configuration file located in /etc/nginx/nginx.conf as specified here. Note that you must add the following within the http bloc in the nginx configuration file:

map $http_upgrade $connection_upgrade {
    default upgrade;
    ''      close;
}

You can use the following command to open the configuration file:

$ sudo nano /etc/nginx/nginx.conf

To test if the configuration files are syntactically correct, run the following:

$ sudo nginx -t

This should output the results below if the configuration test has passed:

nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

Important: By default, there will be a default configuration file located in the sites-available and sites-enabled directories. We must also remove them:

$ cd /etc/nginx/sites-enabled
$ sudo rm default
$ cd /etc/nginx/sites-available
$ sudo rm default

Finally, restart nginx:

$ sudo service nginx start

Step 6: Deployment

Remove Example Shiny Files

To deploy your shiny application or interactive documents, we first need to remove the default index.html and sample-apps from the R Shiny server:

# Set file permissions to read/write
$ sudo chmod 7777 /srv/shiny-server/
$ sudo rm /srv/shiny-server/index.html
$ sudo rm -rf /srv/shiny-server/sample-apps

For other chmod options, see Chapter 7 of Matt Dancho’s book.

Github

You could upload the source files of your shiny application or interactive document directly to your EC2 instance using the upload button in file pane of the Rstudio IDE:

However, in this post, we will opt to host the source files in a remote repository on github, and then cloning said repository from within the Rstudio IDE on our EC2 instance. One huge benefit of this approach is version control, which allows us to easily keep track of changes to our source files over time. To set up the remote repository on github for the source files, you could read the following chapter of Matt Dancho’s book. Some resources that helped me learn more about Github and git version control in the past are:

Assuming that you now have a github repository containing all of your source files and their dependencies. The next step is to configure your git user.name and user.email in the Rstudio terminal using the following commands:

git config --global user.name 'Your Name'
git config --global user.email 'your_email@example.com'
git config --global credential.helper 'cache --timeout=10000000'

The third command tells git to cache your password for the next four months (about ten million seconds). For more details, please see the following tutorial by Jenny Bryan. Once you have configured your name and e-mail address in git, the following steps will clone the remote repository containing your source files from within the Rstudio IDE on your EC2 instance:

In your repository, click on the Code (green) button and copy the HTTPS URL in the drop-down menu.

Create a new project within your Rstudio IDE on your EC2 instance. Select Version Control.

Select “Clone a project from a Git repository” and enter your URL and the name of your git repository.

Once you create the project, you should see that your source files are located in the file pane of the IDE:

Deploying With Shiny Server

You are now ready to copy the source files on your EC2 to /srv/shiny-server/. Recall from the previous section that we are deploying through a hosting model called site_dir, which hosts the entire directory tree at /srv/shiny-server. Run the following functions to create a sub-directory within /srv/shiny-server/ and copy your source files from your EC2 instance into said sub-directory. The name of the directory can be anything as long as it is syntactically valid:

# To create a new sub-directory
dir.create(path = "/srv/shiny-server/portfolio_dashboard")
# Copy the source file into the directory created above
file.copy("dashboard.Rmd", "/srv/shiny-server/portfolio_dashboard")

If you get an error that the file you are seeking to copy does not exist, check to make sure that the first file path is specified correctly. If you created the project using the steps above, your working directory should be the project directory on your EC2 instance. Some other useful functions and commands are:

# To remove files
file.remove("/srv/shiny-server/portfolio_dashboard/dashboard.Rmd")
# To list files in a directory
list.files(path = "/srv/shiny-server/portfolio_dashboard/")

In the terminal (as in our EC2 instance and not the Rstudio IDE):

# To remove directories within /srv/shiny-server/
$ sudo rm -rf /srv/shiny-server/sub_directory

Now, in your browser, enter http://http://<public-ipv4>/* (if you created a sub-directory within /srv/shiny-server/) or http://<public-ipv4> (if you simply copied the source files to /srv/shiny-server/). Your application or interactive documents should be successfully deployed at you EC2’s elastic IP address:

Step 7 (Recommended): Domain Name

Google Domain

A domain name is simply the name of a website. Examples of domain names include google.com, wikipedia.org, and youtube.com. If we wish to use a domain name rather than the raw IPv4/elastic IP address of our EC2 instance (for instance, you work for an organization that is security-conscious about potentially exposing its IP address), we need to purchase a domain name. If you are lucky, changes are that your organization may already own a domain. But, for this post, we will purchase our own domain name. You may be tempted to look for free solutions, but remember the century-old adage:

There ain’t no such thing as a free lunch.

Plus, when it comes to domain names, the following statement may also be accurate:

Customers should not buy from people simply for cheaps, but buy from those they can trust.

Fortunately, there are many trusted domain registrars where domains are available at very reasonable prices. You could check the Forbes list for the best domain registrars of 2021. In this post, we will use google domain, which I have been using for my personal website, but you could also use any other registrars. The set up should be very similar for almost all domain registrars.

The first step is to navigate to google domain, and enter a domain into the search box:

You may wish to choose something that is not extremely well-known or popular, since those would likely have already been purchased.

Select your domain name and proceed to checkout. Note that you need a Google account in order to purchase a domain. Signing up for a Google account is free.

Finally, you should see your purchased domain name under My Domain to the left of the interface. For example, here are my purchased domain names:

DNS

The Domain Name System, DNS, is a system that … resolves domain names and IP addresses. It converts [humanly] readable domain names (e.g., www.google.com) into Internet Protocol (IP) addresses (e.g., 173.194.39.78). Computers can only communicate using series of numbers, so DNS was developed as a sort of “phonebook” that translates the domain you enter in your browser into a computer readable IP.

In short, the DNS is what makes it possible to navigate the web using domain names like google.com instead of having to remember and use the underlying IP addresses for a given website, such as 172.217.3.206. To point the purchased domain name to our application hosted on our EC2 instance, we need to create an entry of resource record via the DNS tab to the left of the interface:

This step should be almost identical irrespective of the domain registrar. To understand the set up above:

The Host name field specifies the domain, subdomain, or host. The default is @, which is the domain name that we have purchased. Since I want my main domain dashwu.com to redirect to my AWS server, I used @. If you wish to use a subdomain to redirect to your AWS server, you can enter something like subdomain in the box. Then, dashwu.com/subdomain will redirect to your AWS server.
The Type field specifies the type of the record. We wish to point the domain to the application hosted on our EC2 instance at our elastic IP address. IP addresses are numeric addresses. When a website is created, an A (IPv4) or AAAA (IPv6) record is used to define the IP address of the website host. Therefore, we choose A as our record type. To learn about other record types, see the Google Domains Help Page.
The Time-To-Live (TTL) field controls how often your local copy of the resource record is updated or discarded (the default is 1 hour).
Lastly, the Data field specifies the information stored in the record. This would be different depending on the record type. In this case, it would be our AWS elastic IP address, e.g., 123.123.123.123.

Configure Nginx

The last step is to add the server_name directive to the nginx configuration file located at /etc/nginx/sites-available/shiny.conf:

$ sudo service nginx stop
$ sudo nano /etc/nginx/sites-available/shiny.conf

The modified configuration file would be as follows (make sure you swap the placeholder your_domain_name with your own domain name):

server {
    # Listen on 80 port
    listen 80;
    # For IPv6 addresses
    listen [::]:80;
    # Server name
    server_name your_domain_name;
    # The reverse proxy
    location / {
        proxy_pass http://127.0.0.1:3838/;
        proxy_redirect http://127.0.0.1:3838/ $scheme://$host/;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_read_timeout 20d;
        proxy_buffering off;
    }
}

You could read the documentation for details on the server_name directive.

# Restart nginx
$ sudo service nginx start

Step 8 (Highly Recommended): Secure With HTTPS

If you examine your web browser when accessing your application, you may see a flag that looks as follows:

This is because some web browsers, such as Chrome, flag HTTP websites as Not Secure in the URL bar. The flag is, in part, an effort to encourage web developers to switch to Hypertext transfer protocol secure (HTTPS), which is the secure version of HTTP and is also the primary protocol used to send data between a web browser and a website. While HTTP is the foundation of the internet, data stream transferring that occurs over HTTP is not encrypted, making it vulnerable to security risks such as Man-in-the-middle attacks. With HTTPS, traffic is encrypted such that even if the data were to be intercepted, they would appear as non-humanly readable characters.

For our purposes, this may be of concern to our clients or our bosses, so we should take care to address the problem. HTTPS uses an encryption protocol that is called Transport Layer Security (TLS). In order to switch from HTTP to HTTPS, we first need to obtain an TSL/SSL certificate, which is a data file hosted in a website’s server that contains the website’s public key and identity, along with other related information. We can usually obtain the certificate from a certificate authority (CA), but this approach has a cost to it. Instead, we will obtain our certificate from the Let’s Encrypt certificate authority, which offers digital certificates for free to anyone who owns a domain name.

Let’s Encrypt With Certbot

To obtain the the TSL certificate, we will use Certbot, which is an open source software using Let’s Encrypt certificates on websites in order to enable HTTPS. Proceed to the following instructions page to obtain the most up-to-date installation instructions for your EC2 instance. The choices for software and system should be:

Software: Nginx
System: Whatever AMI/OS you are running on your EC2
Previously, Certbot can be installed from the Certbot Personal Package Archive (PPA). The first step is to install snapd. Conveniently, on Ubuntu 20.04.3 LTS (at the time of writing this post), Snap is pre-installed and ready to go out of the box. The following command ensures we have the latest version of snapd:

$ sudo snap install core; sudo snap refresh core

Install Certbot:

$ sudo snap install --classic certbot

This should return something like the following:

certbot 1.22.0 from Certbot Project (certbot-eff✓) installed

Use the following to make a symbolic link between the two file paths:

$ sudo ln -s /snap/bin/certbot /usr/bin/certbot

There is a way to run Certbot so that it edits your nginx configuration automatically to serve it. But we just need to get a certificate, since we would like to configure nginx ourselves (certonly stands for “certificate only”):

# Using the --nginx flag
$ sudo certbot certonly --nginx

For more information, you could refer to the Certbot documentation. The program above will prompt you to enter an email address before confirming that you have successfully received a certificate:

The Certbot packages on your system come with a cron job or systemd timer that will renew your certificates automatically before they expire. To test automatic renewal:

$ sudo certbot renew --dry-run

This should return something like the following:

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Processing /etc/letsencrypt/renewal/dashwu.com.conf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Account registered.
Simulating renewal of an existing certificate for dashwu.com

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Congratulations, all simulated renewals succeeded: 
  /etc/letsencrypt/live/dashwu.com/fullchain.pem (success)

The command to renew certbot is installed in one of the following locations:

$ cd /etc/crontab/
$ cd /etc/cron.*/*
$ cd systemctl list-timers

The certificate and key are save to:

# Become root user
$ sudo -i
$ cd /etc/letsencrypt/live/dashwu.com/

Configure Nginx

The default port for browser access is port 80, but HTTPS defaults to port 443. See the full list of default port numbers here. Stop nginx and open up the configuration file:

$ sudo service nginx stop
$ sudo nano /etc/nginx/sites-available/shiny.conf

You configuration file should look similar to this:

server {
    # Listen on 80 port
    listen 80;
    # For IPv6 addresses
    listen [::]:80;
    # Server name
    server_name dashwu.com;
    # The reverse proxy
    location / {
        proxy_pass http://127.0.0.1:3838/;
        proxy_redirect http://127.0.0.1:3838/ $scheme://$host/;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_read_timeout 20d;
        proxy_buffering off;
    }
}

We modify the file as follows:

Add the http2 parameter to the listen directives (see documentation):

listen 443 ssl http2;
listen [::]:443 ssl http2;

Add a single catch-all HTTP block (above our HTTPS server block) to redirect all traffic to the HTTPS version of the site:

server {
    # Sets server block as the default (catch-all) block for all unmatched domains
    listen 80;
    listen [::]:80;

    location / {
        return 301 https://$host$request_uri;
    }
}

The line return 301 https://$host$request_uri redirects all traffic to the corresponding HTTPS server block with status code 301. The $host variable holds the domain name of the request.

We also need to add a huge block that Mozilla’s SSL/TLS Configuration Generator returns. Our configuration file should now look like this:

server {
    listen 80;
    listen [::]:80;

    location / {
        return 301 https://$host$request_uri;
    }
}

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name dashwu.com;

    location / {
        proxy_pass http://127.0.0.1:3838/;
        proxy_redirect http://127.0.0.1:3838/ $scheme://$host/;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_read_timeout 20d;
        proxy_buffering off;
    }
  
    ssl_certificate /etc/letsencrypt/live/dashwu.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/dashwu.com/privkey.pem;
    ssl_session_timeout 1d;
    ssl_session_cache shared:MozSSL:10m;  # About 40000 sessions
    ssl_session_tickets off;

    # curl https://ssl-config.mozilla.org/ffdhe2048.txt > /path/to/dhparam
    ssl_dhparam /etc/ssl/certs/dhparam.pem;

    # intermediate configuration
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
    ssl_prefer_server_ciphers off;

    # HSTS (ngx_http_headers_module is required) (63072000 seconds)
    # add_header Strict-Transport-Security "max-age=63072000" always;

    # OCSP stapling
    ssl_stapling on;
    ssl_stapling_verify on;

    # verify chain of trust of OCSP response using Root CA and Intermediate certs
    ssl_trusted_certificate /etc/letsencrypt/live/dashwu.com/chain.pem;
}

The path placeholders that I have changed after copying the results of the generator are as follows:

ssl_certificate and ssl_certificate_key:

ssl_certificate /etc/letsencrypt/live/dashwu.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/dashwu.com/privkey.pem;

For the ssl_dhparam directive, use the comment to create a dhparam.pem file:

$ sudo -i
$ curl https://ssl-config.mozilla.org/ffdhe2048.txt > /etc/ssl/certs/dhparam.pem
# Add to nginx configuration
ssl_dhparam /etc/ssl/certs/dhparam.pem;

Another way is to use OpenSSL, making sure that 2048 bits is used to create the .pem file:

$ openssl dhparam -out /etc/ssl/dhparam.pem 2048

The ssl_trusted_certificate directive specifies a file with trusted CA certificates in the PEM format used to verify client certificates and OCSP responses if ssl_stapling is enabled (True for our configuration). According to the README file located in /etc/letsencrypt/live/dashwu.com, the chain.pem file is used for OCSP stapling. Therefore, we need to add the following path to the configuration:

ssl_trusted_certificate /etc/letsencrypt/live/dashwu.com/chain.pem;

The HTTP Strict-Transport-Security response header informs browsers that the site should only be accessed using HTTPS, and that any future attempts to access it using HTTP should automatically be converted to HTTPS. Comment out the add_header Strict-Transport-Security "max-age=63072000" always; line since we only want to include it when we are sure everything is working.

Add Port 443 to EC2 Security Group

The following steps add a new rule to our EC2’s security group:

Navigate to the left pane and select “Security Group” under “Network & Security”
Select the security group for you EC2 instance, choose Actions, Edit inbound rules
The following inbound rules should be added:

Restart nginx:

$ sudo service nginx start

Include HTTP Strict-Transport-Security

Enter your elastic IP into your browser, you should see that your browser has flagged your connection as secure:

Now you may include the HTTP Strict-Transport-Security header in the nginx configuration file by uncommenting the line:

$ sudo service nginx stop
$ sudo nano /etc/nginx/sites-available/shiny.conf
# Uncomment 
add_header Strict-Transport-Security "max-age=63072000" always;
# Restart
$ sudo service nginx start

Step 9 (Optional): Password Protect

Sometimes we may need to password protect our application. For this task, we will use nginx again. In particular, the ngx_http_auth_basic_module module allows for limiting access to resources by validating the user name and password using the “HTTP Basic Authentication” protocol.

Configure Nginx

The following steps configure the nginx configuration file located at /etc/nginx/sites-available/shiny.conf to use the ngx_http_auth_basic_module. If you have not set up a domain name or switched to HTTPS, your configuration file may look different, but the steps should be very similar regardless. You essentially need to add a location block inside your server block.

Stop nginx and open up the configuration file:

$ sudo service nginx stop
$ sudo nano /etc/nginx/sites-available/shiny.conf

Add the following directives to the existing location block with the shortest prefix “/”:

location / {
    proxy_pass http://127.0.0.1:3838/;
    proxy_redirect http://127.0.0.1:3838/ $scheme://$host/;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection $connection_upgrade;
    proxy_read_timeout 20d;
    proxy_buffering off;
    # Add the following
    auth_basic "Username and Password are required";
    auth_basic_user_file /etc/nginx/.htpasswd;
}

More details on nginx location blocks can be found here under section “Serving Static Content”. For more information on nginx server and location block selection algorithms, see the following article.

Your configuration file should now look similar to this:

server {
    listen 80;
    listen [::]:80;

    location / {
        return 301 https://$host$request_uri;
    }
}

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name dashwu.com;

    # The reverse proxy
    location / {
        proxy_pass http://127.0.0.1:3838/;
        proxy_redirect http://127.0.0.1:3838/ $scheme://$host/;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_read_timeout 20d;
        proxy_buffering off;
        auth_basic "Username and Password are required";
        auth_basic_user_file /etc/nginx/.htpasswd;
    }

    ssl_certificate /etc/letsencrypt/live/dashwu.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/dashwu.com/privkey.pem;
    ssl_session_timeout 1d;
    ssl_session_cache shared:MozSSL:10m;  # About 40000 sessions
    ssl_session_tickets off;

    # curl https://ssl-config.mozilla.org/ffdhe2048.txt > /path/to/dhparam
    ssl_dhparam /etc/ssl/certs/dhparam.pem;

    # intermediate configuration
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-S>
    ssl_prefer_server_ciphers off;

    # HSTS (ngx_http_headers_module is required) (63072000 seconds)
    add_header Strict-Transport-Security "max-age=63072000" always;

    # OCSP stapling
    ssl_stapling on;
    ssl_stapling_verify on;

    # verify chain of trust of OCSP response using Root CA and Intermediate certs
    ssl_trusted_certificate /etc/letsencrypt/live/dashwu.com/chain.pem;
}

If you navigate to your elastic IP address, you should now see that your application is now password protected:

User Login

The final step is to create username-password pairs to access your application hosted on your EC2 instance at your elastic IP address. We could do so using apache2-utils (Debian, Ubuntu) or httpd-tools (RHEL/CentOS/Oracle Linux). In section “Upgrading and Installing System Packages”, we have already installed apache2-utils. To check if it is indeed installed:

$ dpkg --list | grep apache2-utils
# If not installed
$ sudo apt-get -y install apache2-utils

To create a password file and a first user. Use the htpasswd utility with the -c flag (which stands for “create a new file”). The file pathname is the first argument and the username is the second argument:

$ sudo htpasswd -c /etc/nginx/.htpasswd user1

This should prompt you to enter a password for user1.

Creating additional user-password pairs does not require the -c flag since the file already exists:

$ sudo htpasswd /etc/nginx/.htpasswd user2

To list all files containing paired usernames and hashed passwords:

$ cat /etc/nginx/.htpasswd

To remove a user, simply delete the file using the -D flag:

$ sudo htpasswd -D /etc/nginx/.htpasswd user2

To change password for a user:

# Change password for user2
$ sudo htpasswd /etc/nginx/.htpasswd user2

Refer to the manual page to see other command line arguments.

Resources

The hyperlinks embedded through out the post are the resources that have helped me improve my understanding of the deployment process, particularly through peaking under the hood of things. In addition, I also found the following links extremely useful.

Deploying Shiny Apps or Interactice Documents with Shiny Server and AWS

Step 1: AWS EC2

Launch an EC2 Instance and Select an AMI

Choose an Instance Type

Configure Instance Details

Add Storage

Add Tags

Configure Security Group

Key Pair

Elastic IP

Step 2: Connecting to AWS EC2

Connecting via SSH

Disconnecting

Upgrading and Installing System Packages

Step 3: Installing R, Rstudio Server, and Shiny Server

Installing R from CRAN

Installing Rstudio Server

Installing Shiny Server

Checking installed

Install R packages (System Library)

Install R packages (User Library)

Step 4: Rstudio Server and IDE

User Login

User Library

Step 5: Shiny Server

Configure Shiny Server

Reverse Proxy

Step 6: Deployment

Remove Example Shiny Files

Github

Deploying With Shiny Server

Step 7 (Recommended): Domain Name

Google Domain

DNS

Configure Nginx

Step 8 (Highly Recommended): Secure With HTTPS

Let’s Encrypt With Certbot

Configure Nginx

Add Port 443 to EC2 Security Group

Include HTTP Strict-Transport-Security

Step 9 (Optional): Password Protect

Configure Nginx

User Login

Resources

Shiny

AWS

Nginx

Operating Systems and IT

Yang (Ken) Wu

Related