So, I encountered a situation the other day where we were having an issue with hosting some of our blogs. Without wanting to go into the issue too far, we wanted to simplify the setup of one of our websites while still having a redundant and replicated site overall. The blog was located in our website under a subdirectory (http://mydomain.com/blogname). We finally resolved the issue by setting up an configuring Apache Reverse Proxy which we will detail later in this article, but
Subdomains vs Subdirectories SEO Discussion
So, we in the Systems team decided it would be much easier to just move the blogs to an independent server, use a subdomain of our primary domain to get to the blog and 301 redirects to send traffic using the old links over to the new ones. But this led to the link-juice discussion and will we lose SEO traffic and the like. We listened to Matt Cutts’ Q/A on the topic (https://www.youtube.com/watch?v=_MswMYk05tk) several times and nearly did it, but there were just so many others on the internet who appeared to have difficulty with it and lost traffic due to moving to subdomains from subfolders.
Status Quo was the Best Potential SEO Outcome
While there are some principles of SEO that obviously work, there are a lot of unknowns also that can be very costly to your SEO to prove them out. We would prefer having absolute fact and experience on our side before we made a decision like this. We figured that the best outcome we could get by moving was staying status quo, while we had potentially very poor outcomes and losing a lot of SEO traffic.
To avert the risk, we looked for a solution to separate the blog from our redundant webpage while still having it located as a subfolder on the website. We decided to use Apache Reverse Proxy to resolve our issue.
Make a Subdomain as a Subdirectory with Apache Reverse Proxy
Apache Reverse Proxy to the rescue!
A reverse proxy serves the webpage by allowing you to add additional resources to it. This gives you great flexibility as you can add all types of applications that you may not want to or be able to serve directly on the website server itself.
You can also serve applications that may not be available elsewhere using Apache Reverse Proxy. As long as the webserver itself can access the other applications, it can serve them for others. In the example in the Apache Reverse Proxy Diagram to the right, you can see how it works. The Apache Reverse Proxy Server has three Reverse Proxies defined (app1, app2, app3). All of these application servers are hidden behind the firewall and are not typically available. But because the Apache Reverse Proxy Server does have access to them, it is able to serve them. Be warned! Somebody could server your private applications this way!
In the situation of my blog server, it makes total sense. But you will want to be sure that anything you serve this way is non-confidential and appropriate to be publicly available.
Configure an Apache Reverse Proxy Server
For Apache to reverse proxy for you, you will need to install the mod_proxy module. It may be installed on your machine. You can verify if it is or not by grepping through the Apache configuration file(s)
# grep proxy_module /etc/httpd/conf/httpd.conf
LoadModule proxy_module modules/mod_proxy.so
In the case above, we have the mod_proxy already installed. If you don’t you will need to install it and we will discuss that in a moment.
When you use an Apache Reverse Proxy to serve an application from another server, you will likely run into an issue where the links on your page don’t work properly. In my case, any link in my blog would direct the browser to the subdomain location rather than the subdomain because the blog doesn’t even realize it is being reverse proxied. To resolve this problem, you will want to also install mod_proxy_html which translates the links for you from the subdomain to the subdirectory links. You will want to see if it is already installed on your Apache server or not:
# grep -r proxy_html /etc/httpd
/etc/httpd/conf.d/proxy_html.conf:LoadModule proxy_html_module modules/mod_proxy_html.so
In the case above, it is installed.
Installing Mod_Proxy and Mod_Proxy_Html with Yum
Typically, one would use the Apache Extension Tool (apxs) to install Apache modules. You can get more information about installing Apache modules using apxs here: http://httpd.apache.org/docs/2.2/programs/apxs.html Since CentOS 6 has come out, it has become much easier because they have made the modules available for installation using yum.
Install Mod_proxy and Mod_proxy_html on CentOS 6
CentOS 6 has added the mod_proxy and mod_proxy_html Apache modules to the CentOS EPEL (Extra Packages for Enterprise Linux) repository which makes installation easy using yum if you have the Epel repository installed. If you need to install it still, you can do it using these quick steps:
# wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
# rpm -Uvh ./epel-release-6-8.noarch.rpm
Preparing... ########################################### [100%]
package epel-release-6-8.noarch is already installed
Once the EPEL repository is installed, you can go ahead and install mod_proxy and mod_proxy_html packages this way:
# Install Prerequisites
yum install httpd-devel libxml2 libxml2-devel
# Install mod_proxy and mod_proxy_html
yum install mod_proxy mod_proxy_html
Add LoadModule statements to the Apache configuration
For Apache to use mod_proxy, it needs to load the modules when Apache is started. You simply need to edit your Apache configuration files and add:
LoadModule proxy_module modules/mod_proxy.so
LoadModule proxy_html_module modules/mod_proxy_html.so
Define Your Reverse Proxies in Apache Configuration Files
The final step in configuring Apache reverse proxies is to define the reverse proxies themselves in the Apache configuration file. You will want to put this in the proper area of your VirtualHost definition:
ProxyPass /app1 http://app1.mydomain.com
ProxyHTMLURLMap http://app1.mydomain.com /app1
ProxyHTMLURLMap / /app1
ProxyPass /app2 http://app2.mydomain.com
ProxyHTMLURLMap http://www.mydomain.com /app2
ProxyHTMLURLMap / /app2
ProxyPass /app3 http://app3.mydomain.com
ProxyHTMLURLMap http://app3.mydomain.com /app3
ProxyHTMLURLMap / /app3
After configuring the Reverse Proxies in the Apache configuration file you need to restart the Apache httpd:
# service httpd restart
You should now be able to reach your applications served at: