This is part 2 of my earlier post on how to set configure to use a http proxy transparently. This post deals with extending the same for transparent HTTPS proxying. Click Here for my earlier post which deals with HTTP proxying. For a quick-fix solution and list of files mentioned in this post skip to the bottom of the post.
After setting up a transparent http proxy on my dd-wrt router to transparently proxy my HTTP requests I haven’t had any issues for more than a year and was happily able to use it. But up until recently my local network used to allow direct HTTPS connections to external IP addresses. Now my network has disabled that, which means I need to forcefully use the HTTP proxy in order to be able to make HTTPS connections. Surprisingly this caused many more problems than I had anticipated. Lots of applications on android which seemed to work fine after setting proxy settings started failing badly! Notably gmail, hangouts, facebook messenger all only worked very sporadically. Forcefully requesting a sync didn’t work and kept failing inspite of the fact that proxy settings was configured. There were a lot of other issues to for some websites which I use like music streaming services which used websockets over 443 which stopped working as they didn’t use the proxy settings. As a result I set out to figure out a method to get HTTPS proxy working over my DD-WRT router to solve the issue once and for all. There were a couple of differences from my HTTP transparent proxying method which doesn’t allow me to use the same here:
HTTPS is an encrypted protocol and I can’t read the protocol stream to figure out which host I’m attempting to connect to. In HTTP I could blindly intercept the connection and simply look at the Host: header in the HTTP protocol to figure out the destination. After that tinyproxy would re-write the headers and path so that the request will work fine with the HTTP proxy instead of the intended HTTP server. Now however I didn’t have that luxury. Lets have a look at how HTTPS connections works over a HTTP proxy.
As HTTPS is an encrypted connection working over SSL , a HTTP proxy cannot to do most of the caching or other functionality it normally does on a HTTP connection. HTTP Proxies come with a CONNECT method which allows the user to make an arbitrary TCP connection over the proxy to a specified host and port. Most HTTP proxies will allow the CONNECT method over port 443 (which is the port for HTTPS) , and often blocking other ports which the proxy does not want to allow access to. The CONNECT method over a HTTP proxy’s protocol looks like this:
Client:
CONNECT google.com:443
Server:
HTTP/1.0 200 Connection established
Basically you send a single CONNECT request with the hostname and port number ended with “\r\n” to which the proxy replies with a status code 200 to show successful connection again ended with “\r\n”. Thereafter the rest of the stream is as good as a raw TCP connection. Usually after this the standard SSL handshake begins normally over this stream as if it was a normal direct TCP connection. Alternatively services can use the port 443 for other purposes such as in my case for music streaming and the like. This still works because the proxy just makes a raw TCP connection and doesn’t attempt (usually) to monitor the protocol followed by the rest of the connection since it is usually an encrypted SSL and so mostly meaningless for the proxy.
So in our case a transparent proxying for HTTPS would have to first intercept a TCP connection intended for an external internet IP for port 443 and re-route that to some intermediate proxy program (which we shall design) intended to do the proxy handshake using the CONNECT method and once the raw TCP connection has been established, blindly forward packets so that the client is unaware that it is actually using a proxy and thinks that it is directly communicating with the HTTPS server. Making the CONNECT request is trivial, but the only problem we face now is in finding out who the original intended host was?
This problem luckily has a simple solution! The linux netfilter/iptables luckily add’s the original destination’s IP address to the tcp socket as an extra socket parameter SO_ORIGINAL_DST. Whenever we redirect packets in iptables using a DNAT to a new destination IP address the original destination IP is accessible using getsockopt with SO_ORIGINAL_DST on the socket to get the original destination.
Using this information I had previously written a simple python proxy forwarder + iptables combo designed to run on my system to do exactly this. You can get this here. To use the script simply run the iptables commands given in comments as root, and then run the python script (normal user suffices). The python script acts as the intermediary proxy to tunnel HTTPS connections through the HTTP proxy by making a CONNECT request and then forwarding packets in the rest of the connection.
So great, it works! But we’re still not done. All right well and done that we got it working , but we really need this for our wifi router! Luckily it being a linux box gives us some hope of being able to do the same thing, except on the router. A router with little under a few MB or ram and storage would not be the ideal device to be running a badly written script in python. Clearly I need a more lean and mean solution, written in C. The main pain point here would be how on earth to cross-compile this for my router’s CPU. It’s an atheros chipset and hence I would need to cross-compile it for the mips architecture. I struggled a lot (a really lot, trust me) to set up my own mips cross-compiling tool chain from gcc. After a lot of effort I compiled gcc for mips , but I got stuck on some issue of not being able to import anything at all. I suspect I overlooked setting up libc now in retrospect. But anyways I gave up on this entirely in exasperation. This was a while back when https was working in my local network and I didn’t have that much of an inspiration to take the effort. Now that my local network forced me to get this working I finally scourged the net and finally realized that openwrt provided a MIPS toolchain for my chipset which is in fact the same toolchain used at dd-wrt and other places for compiling binaries for my router! (gcc toolchain for MIPS_34KC ar71XX at OpenWrt – Newer version might exist at their download site) . I still had no idea if the SO_ORIGINAL_DST would still work on my dd-wrt router, so first thing I did was code a very short program to test that and lo an behold it works, so now it was simply a matter of translating the python code I had already written into C using Linux sockets. So although a bit mundane to just translate, I went ahead and did just that and copied over the iptables I had used to work on my router for HTTP intercepting and modified the same for port 443. I set it all up on my router and ran it, and Sweet Success :D. Initially it seemed a bit laggy and slow to setup the HTTPS connections, but it seemed quite usable for most purposes after that! So with it configured to autorun on router-reboot I have a flawlessly working solution which makes all my android applications sync again, my music streaming works again and I’m Happy! I’ve added links to the code, toolchain, as well as binaries for my atheros chipset below at the end of this post. Enjoy !
Note:
The approach I have finally used assumes that a working DNS server is accessible directly which can resolve external domain names to their valid ip addresses correctly. If such a DNS server is not accessible it is still possible to hack around this (though I have not attempted it) by using your own custom DNS server which forwards local name requests to the local DNS but intelligently provides unique fake IP addresses for external domain names. Then using the method above we can intercept connections and find the intended destination IP address. The intermediate proxy can communicate with the fake DNS and have a unique mapping of the fake IP-address to the correct original domain name. Then it’s a simple matter of making a CONNECT request to the Hostname instead of the ip addres!
Links/Files from post:
HTTPS transparent proxying for a linux desktop (works only for the single system)
Instructions: run iptables given in comments as root, run script as normal user.
OpenWrt toolchain for ar71xx
Instructions: Just extract and use for gcc. use to copmpile C programs to be run on the router
HTTPS transparent proxy code: tproxyhttps.c
Instructions: compile for mips as “mips-openwrt-linux-gcc proxy.c -ldl -lpthread -o tproxyhttps”
This is just my attempt at getting this working , it is not the most efficiently written code and could do better certainly. May degrade performance, but seems reasonably usable in my experience. Use at your own discretion. Avoid excessive logging when running on the router for a long-term.
HTTPS transparent proxy binary/source/router-script : tproxyhttps.tgz
This package includes a pre-compiled binary for mips along with the source code and a startup script onrouter.sh intended to be run on the router giving an idea of what iptables rules need to be set and how to run the binary. For an explanation of the iptables refer to my earlier post part-1.
Relevant files posted at github repository
After setting up a transparent http proxy on my dd-wrt router to transparently proxy my HTTP requests I haven’t had any issues for more than a year and was happily able to use it. But up until recently my local network used to allow direct HTTPS connections to external IP addresses. Now my network has disabled that, which means I need to forcefully use the HTTP proxy in order to be able to make HTTPS connections. Surprisingly this caused many more problems than I had anticipated. Lots of applications on android which seemed to work fine after setting proxy settings started failing badly! Notably gmail, hangouts, facebook messenger all only worked very sporadically. Forcefully requesting a sync didn’t work and kept failing inspite of the fact that proxy settings was configured. There were a lot of other issues to for some websites which I use like music streaming services which used websockets over 443 which stopped working as they didn’t use the proxy settings. As a result I set out to figure out a method to get HTTPS proxy working over my DD-WRT router to solve the issue once and for all. There were a couple of differences from my HTTP transparent proxying method which doesn’t allow me to use the same here:
HTTPS is an encrypted protocol and I can’t read the protocol stream to figure out which host I’m attempting to connect to. In HTTP I could blindly intercept the connection and simply look at the Host: header in the HTTP protocol to figure out the destination. After that tinyproxy would re-write the headers and path so that the request will work fine with the HTTP proxy instead of the intended HTTP server. Now however I didn’t have that luxury. Lets have a look at how HTTPS connections works over a HTTP proxy.
As HTTPS is an encrypted connection working over SSL , a HTTP proxy cannot to do most of the caching or other functionality it normally does on a HTTP connection. HTTP Proxies come with a CONNECT method which allows the user to make an arbitrary TCP connection over the proxy to a specified host and port. Most HTTP proxies will allow the CONNECT method over port 443 (which is the port for HTTPS) , and often blocking other ports which the proxy does not want to allow access to. The CONNECT method over a HTTP proxy’s protocol looks like this:
Client:
CONNECT google.com:443
Server:
HTTP/1.0 200 Connection established
Basically you send a single CONNECT request with the hostname and port number ended with “\r\n” to which the proxy replies with a status code 200 to show successful connection again ended with “\r\n”. Thereafter the rest of the stream is as good as a raw TCP connection. Usually after this the standard SSL handshake begins normally over this stream as if it was a normal direct TCP connection. Alternatively services can use the port 443 for other purposes such as in my case for music streaming and the like. This still works because the proxy just makes a raw TCP connection and doesn’t attempt (usually) to monitor the protocol followed by the rest of the connection since it is usually an encrypted SSL and so mostly meaningless for the proxy.
So in our case a transparent proxying for HTTPS would have to first intercept a TCP connection intended for an external internet IP for port 443 and re-route that to some intermediate proxy program (which we shall design) intended to do the proxy handshake using the CONNECT method and once the raw TCP connection has been established, blindly forward packets so that the client is unaware that it is actually using a proxy and thinks that it is directly communicating with the HTTPS server. Making the CONNECT request is trivial, but the only problem we face now is in finding out who the original intended host was?
This problem luckily has a simple solution! The linux netfilter/iptables luckily add’s the original destination’s IP address to the tcp socket as an extra socket parameter SO_ORIGINAL_DST. Whenever we redirect packets in iptables using a DNAT to a new destination IP address the original destination IP is accessible using getsockopt with SO_ORIGINAL_DST on the socket to get the original destination.
Using this information I had previously written a simple python proxy forwarder + iptables combo designed to run on my system to do exactly this. You can get this here. To use the script simply run the iptables commands given in comments as root, and then run the python script (normal user suffices). The python script acts as the intermediary proxy to tunnel HTTPS connections through the HTTP proxy by making a CONNECT request and then forwarding packets in the rest of the connection.
So great, it works! But we’re still not done. All right well and done that we got it working , but we really need this for our wifi router! Luckily it being a linux box gives us some hope of being able to do the same thing, except on the router. A router with little under a few MB or ram and storage would not be the ideal device to be running a badly written script in python. Clearly I need a more lean and mean solution, written in C. The main pain point here would be how on earth to cross-compile this for my router’s CPU. It’s an atheros chipset and hence I would need to cross-compile it for the mips architecture. I struggled a lot (a really lot, trust me) to set up my own mips cross-compiling tool chain from gcc. After a lot of effort I compiled gcc for mips , but I got stuck on some issue of not being able to import anything at all. I suspect I overlooked setting up libc now in retrospect. But anyways I gave up on this entirely in exasperation. This was a while back when https was working in my local network and I didn’t have that much of an inspiration to take the effort. Now that my local network forced me to get this working I finally scourged the net and finally realized that openwrt provided a MIPS toolchain for my chipset which is in fact the same toolchain used at dd-wrt and other places for compiling binaries for my router! (gcc toolchain for MIPS_34KC ar71XX at OpenWrt – Newer version might exist at their download site) . I still had no idea if the SO_ORIGINAL_DST would still work on my dd-wrt router, so first thing I did was code a very short program to test that and lo an behold it works, so now it was simply a matter of translating the python code I had already written into C using Linux sockets. So although a bit mundane to just translate, I went ahead and did just that and copied over the iptables I had used to work on my router for HTTP intercepting and modified the same for port 443. I set it all up on my router and ran it, and Sweet Success :D. Initially it seemed a bit laggy and slow to setup the HTTPS connections, but it seemed quite usable for most purposes after that! So with it configured to autorun on router-reboot I have a flawlessly working solution which makes all my android applications sync again, my music streaming works again and I’m Happy! I’ve added links to the code, toolchain, as well as binaries for my atheros chipset below at the end of this post. Enjoy !
Note:
The approach I have finally used assumes that a working DNS server is accessible directly which can resolve external domain names to their valid ip addresses correctly. If such a DNS server is not accessible it is still possible to hack around this (though I have not attempted it) by using your own custom DNS server which forwards local name requests to the local DNS but intelligently provides unique fake IP addresses for external domain names. Then using the method above we can intercept connections and find the intended destination IP address. The intermediate proxy can communicate with the fake DNS and have a unique mapping of the fake IP-address to the correct original domain name. Then it’s a simple matter of making a CONNECT request to the Hostname instead of the ip addres!
Links/Files from post:
HTTPS transparent proxying for a linux desktop (works only for the single system)
Instructions: run iptables given in comments as root, run script as normal user.
OpenWrt toolchain for ar71xx
Instructions: Just extract and use for gcc. use to copmpile C programs to be run on the router
HTTPS transparent proxy code: tproxyhttps.c
Instructions: compile for mips as “mips-openwrt-linux-gcc proxy.c -ldl -lpthread -o tproxyhttps”
This is just my attempt at getting this working , it is not the most efficiently written code and could do better certainly. May degrade performance, but seems reasonably usable in my experience. Use at your own discretion. Avoid excessive logging when running on the router for a long-term.
HTTPS transparent proxy binary/source/router-script : tproxyhttps.tgz
This package includes a pre-compiled binary for mips along with the source code and a startup script onrouter.sh intended to be run on the router giving an idea of what iptables rules need to be set and how to run the binary. For an explanation of the iptables refer to my earlier post part-1.
Relevant files posted at github repository
Comments
Post a Comment