Skip to main content

Transparent Pass-through proxy with iptables – Part 2 (for HTTPS)

This is part 2 of my earlier post on how to set configure to use a http proxy transparently. This post deals with extending the same for transparent HTTPS proxying. Click Here for my earlier post which deals with HTTP proxying. For a quick-fix solution and list of files mentioned in this post skip to the bottom of the post.

After setting up a transparent http proxy on my dd-wrt router to transparently proxy my HTTP requests I haven’t had any issues for more than a year and was happily able to use it. But up until recently my local network used to allow direct HTTPS connections to external IP addresses. Now my network has disabled that, which means I need to forcefully use the HTTP proxy in order to be able to make HTTPS connections. Surprisingly this caused many more problems than I had anticipated. Lots of applications on android which seemed to work fine after setting proxy settings started failing badly! Notably gmail, hangouts, facebook messenger all only worked very sporadically. Forcefully requesting a sync didn’t work and kept failing inspite of the fact that proxy settings was configured. There were a lot of other issues to for some websites which I use like music streaming services which used websockets over 443 which stopped working as they didn’t use the proxy settings. As a result I set out to figure out a method to get HTTPS proxy working over my DD-WRT router to solve the issue once and for all. There were a couple of differences from my HTTP transparent proxying method which doesn’t allow me to use the same here:

HTTPS is an encrypted protocol and I can’t read the protocol stream to figure out which host I’m attempting to connect to. In HTTP I could blindly intercept the connection and simply look at the Host: header in the HTTP protocol to figure out the destination. After that tinyproxy would re-write the headers and path  so that the request will work fine with the HTTP proxy instead of the intended HTTP server. Now however I didn’t have that luxury. Lets have a look at how HTTPS connections works over a HTTP proxy.

As HTTPS is an encrypted connection working over SSL , a HTTP proxy cannot to do most of the caching or other functionality it normally does on a HTTP connection. HTTP Proxies come with a CONNECT method which allows the user to make an arbitrary TCP connection over the proxy to a specified host and port. Most HTTP proxies will allow the CONNECT method over port 443 (which is the port for HTTPS) , and often blocking other ports which the proxy does not want to allow access to. The CONNECT method over a HTTP proxy’s protocol looks like this:

Client:
CONNECT google.com:443

Server:
HTTP/1.0 200 Connection established


Basically you send a single CONNECT request with the hostname and port number ended with “\r\n” to which the proxy replies with a status code 200 to show successful connection again ended with “\r\n”. Thereafter the rest of the stream is as good as a raw TCP connection. Usually after this the standard SSL handshake begins normally over this stream as if it was a normal direct TCP connection. Alternatively services can use the port 443 for other purposes such as in my case for music streaming and the like. This still works because the proxy just makes a raw TCP connection and doesn’t attempt (usually) to monitor the protocol followed by the rest of the connection since it is usually an encrypted SSL and so mostly meaningless for the proxy.

So in our case a transparent proxying for HTTPS would have to first intercept a TCP connection intended for an external internet IP for port 443 and re-route that to some intermediate proxy program (which we shall design) intended to do the proxy handshake using the CONNECT method and once the raw TCP connection has been established, blindly forward packets so that the client is unaware that it is actually using a proxy and thinks that it is directly communicating with the HTTPS server. Making the CONNECT request is trivial, but the only problem we face now is in finding out who the original intended host was?

This problem luckily has a simple solution! The linux netfilter/iptables luckily add’s the original destination’s IP address to the tcp socket as an extra socket parameter SO_ORIGINAL_DST. Whenever we redirect packets in iptables using a DNAT to a new destination IP address the original destination IP is accessible using getsockopt with SO_ORIGINAL_DST on the socket to get the original destination.

Using this information I had previously written a simple python proxy forwarder + iptables combo designed to run on my system to do exactly this. You can get this here. To use the script simply run the iptables commands given in comments as root, and then run the python script (normal user suffices). The python script acts as the intermediary proxy to tunnel HTTPS connections through the HTTP proxy by making a CONNECT request and then forwarding packets in the rest of the connection.

So great, it works! But we’re still not done. All right well and done that we got it working , but we really need this for our wifi router! Luckily it being a linux box gives us some hope of being able to do the same thing, except on the router. A router with little under a few MB or ram and storage would not be the ideal device to be running a badly written script in python. Clearly I need a more  lean and mean solution, written in C. The main pain point here would be how on earth to cross-compile this for my router’s CPU. It’s an atheros chipset and hence I would need to cross-compile it for the mips architecture. I struggled a lot (a really lot, trust me) to set up my own mips cross-compiling tool chain from gcc. After a lot of effort I compiled gcc for mips , but I got stuck on some issue of not being able to import anything at all. I suspect I overlooked setting up libc now in retrospect. But anyways I gave up on this entirely in exasperation. This was a while back when https was working in my local network and I didn’t have that much of an inspiration to take the effort. Now that my local network forced me to get this working I finally scourged the net and finally realized that openwrt provided a MIPS toolchain for my chipset which is in fact the same toolchain used at dd-wrt and other places for compiling binaries for my router! (gcc toolchain for MIPS_34KC ar71XX at OpenWrt – Newer version might exist at their download site) . I still had no idea if the SO_ORIGINAL_DST would still work on my dd-wrt router, so first thing I did was code a very short program to test that and lo an behold it works, so now it was simply a matter of translating the python code I had already written into C using Linux sockets. So although a bit mundane to just translate, I went ahead and did just that and copied over the iptables I had used to work on my router for HTTP intercepting and modified the same for port 443. I set it all up on my router and ran it, and Sweet Success  :D. Initially it seemed a bit laggy and slow to setup the HTTPS connections, but it seemed quite usable for most purposes after that! So with it configured to autorun on router-reboot I have a flawlessly working solution which makes all my android applications sync again, my music streaming works again and I’m Happy! I’ve added links to the code, toolchain, as well as binaries for my atheros chipset below at the end of this post. Enjoy !

Note:

The approach I have finally used assumes that a working DNS server is accessible directly which can resolve external domain names to their valid ip addresses correctly. If such a DNS server is not accessible it is still possible to hack around this (though I have not attempted it) by using your own custom DNS server which forwards local name requests to the local DNS but intelligently provides unique fake IP addresses for external domain names. Then using the method above we can intercept connections and find the intended destination IP address. The intermediate proxy can communicate with the fake DNS and have a unique mapping of the fake IP-address to the correct original domain name. Then it’s a simple matter of making a CONNECT request to the Hostname instead of the ip addres!



Links/Files from post:

HTTPS transparent proxying for a linux desktop (works only for the single system)
Instructions: run iptables given in comments as root, run script as normal user.

OpenWrt toolchain for ar71xx
Instructions: Just extract and use for gcc. use to copmpile C programs to be run on the router

HTTPS transparent proxy code: tproxyhttps.c
Instructions: compile for mips as “mips-openwrt-linux-gcc proxy.c -ldl -lpthread -o tproxyhttps”
This is just my attempt at getting this working , it is not the most efficiently written code and could do better certainly. May degrade performance, but seems reasonably usable in my experience. Use at your own discretion. Avoid excessive logging when running on the router for a long-term.

HTTPS transparent proxy binary/source/router-script : tproxyhttps.tgz
This package includes a pre-compiled binary for mips along with the source code and a startup script onrouter.sh intended to be run on the router giving an idea of what iptables rules need to be set and how to run the binary. For an explanation of the iptables refer to my earlier post part-1.

Relevant files posted at github repository

Comments

Popular posts from this blog

Setting up a transparent pass-through proxy with iptables

Update: Part 2 for https posted in separate post! So for a very long now I’ve had a nagging issue with proxies. My primary source of internet is through my college HTTP Proxy and this adds a couple of issues whenever I am dealing with applications that don’t have proxy support coded in them. I have this issue often both on my laptop as well as on my android tablet (Youtube streaming!). Its a very distressing situation and I’ve always wanted to set-up a transparent proxy solution which could re-direct the traffic out of such applications to a sort of secondary proxy server which can interpret the requests and forward them to my college proxy server. Recently I managed to get this working! The main tool used for this was iptables. For those of you who haven’t heard of iptables at a glance it is a flexible firewall which is now part of the Linux kernel by default. But iptables is actually much more powerful and flexible than just a simple firewall to block ports. iptables ...

Converting Your DD-WRT Router to a clock!

So the other night I suddenly woke up in the middle of the night. I was fumbling through my extremely messy desk, which was next to my pillow, to find my phone so that I could see the time. Alas my desk was far too messy to find my phone in the dark. I was far too lazy to take the effort of turning on the lights just to see the time. That was when my eyes glanced upon the ominous green blinking LEDs blinking fervently on my router docked on my wall. I wished I had an LED clock instead and boom it hit me! I have a linux-based DD-WRT router, that means I have full control over it :P. So that was how I began my quest on converting a piece of network equipment to a digital clock! TP-Link WR740N Router The next day, I set upon my quest to find out how I could control the blinking LEDs on my router. I noticed that my router had a total of 9 LEDs , and I figured that was more than sufficient to display the time in binary to a sufficient degree. My plan was originally to use 4 LEDs t...

Welcome to my Site!

Hello! I am Anish Shankar. I love exploring new stuff & Technology interests me a lot. I am a Software Engineer at Google in the Bay Area, and have worked on a variety of Infrastructure projects. In the past I've participated in a lot of programming competitions and used to go by the handle "phinfinty". I've since retired from competitive programming, but the handle has stuck with me! I am a Linux enthusiast and love open source. I use Arch Linux currently for most of my personal work. This website is rarely updated and currently archival.

Some useful SSH configurations

Lately I was tired of having to repeatedly type my user name for my ssh connections. In my current setup I often ssh to two servers inside the IIIT (my college) network. merely typing ssh web.iiit.ac.in would try to use the username as my local computer’s login user name. So I was trying to get a workaround for this. A simple approach would be to rename my local computer’s user name to the IIIT server user name , but now that would be very lame. So I figured there must be some simple configuration available and looked up man ssh_config which gave an extremely detailed list of all the possible configuration options. Finally my configuration file looked like this : Host *.iiit.ac.in mirage web User iiit_login_user Host * ControlMaster auto ControlPath /home/phinfinity/.ssh/%r@%h:%p GSSAPIAuthentication no The first line specifies the categories for which the configuration below are to be used. Here I have specified the configuration to be applied for 3 possible categories “*.ii...