Transparent Pass-through proxy with iptables – Part 2 (for HTTPS)

This is part 2 of my earlier post on how to set configure to use a http proxy transparently. This post deals with extending the same for transparent HTTPS proxying. Click Here for my earlier post which deals with HTTP proxying. For a quick-fix solution and list of files mentioned in this post skip to the bottom of the post.

After setting up a transparent http proxy on my dd-wrt router to transparently proxy my HTTP requests I haven’t had any issues for more than a year and was happily able to use it. But up until recently my local network used to allow direct HTTPS connections to external IP addresses. Now my network has disabled that, which means I need to forcefully use the HTTP proxy in order to be able to make HTTPS connections. Surprisingly this caused many more problems than I had anticipated ๐Ÿ˜ฎ . Lots of applications on android which seemed to work fine after setting proxy settings started failing badly! Notably gmail, hangouts, facebook messenger all only worked very sporadically. Forcefully requesting a sync didn’t work and kept failing inspite of the fact that proxy settings was configured. There were a lot of other issues to for some websites which I use like music streaming services which used websockets over 443 which stopped working as they didn’t use the proxy settings. As a result I set out to figure out a method to get HTTPS proxy working over my DD-WRT router to solve the issue once and for all. There were a couple of differences from my HTTP transparent proxying method which doesn’t allow me to use the same here:

HTTPS is an encrypted protocol and I can’t read the protocol stream to figure out which host I’m attempting to connect to. In HTTP I could blindly intercept the connection and simply look at the Host: header in the HTTP protocol to figure out the destination. After that tinyproxy would re-write the headers and path  so that the request will work fine with the HTTP proxy instead of the intended HTTP server. Now however I didn’t have that luxury. Lets have a look at how HTTPS connections works over a HTTP proxy.

As HTTPS is an encrypted connection working over SSL , a HTTP proxy cannot to do most of the caching or other functionality it normally does on a HTTP connection. HTTP Proxies come with a CONNECT method which allows the user to make an arbitrary TCP connection over the proxy to a specified host and port. Most HTTP proxies will allow the CONNECT method over port 443 (which is the port for HTTPS) , and often blocking other ports which the proxy does not want to allow access to. The CONNECT method over a HTTP proxy’s protocol looks like this:

Client:
CONNECT google.com:443

Server:
HTTP/1.0 200 Connection established

 

Basically you send a single CONNECT request with the hostname and port number ended with “\r\n” to which the proxy replies with a status code 200 to show successful connection again ended with “\r\n”. Thereafter the rest of the stream is as good as a raw TCP connection. Usually after this the standard SSL handshake begins normally over this stream as if it was a normal direct TCP connection. Alternatively services can use the port 443 for other purposes such as in my case for music streaming and the like. This still works because the proxy just makes a raw TCP connection and doesn’t attempt (usually) to monitor the protocol followed by the rest of the connection since it is usually an encrypted SSL and so mostly meaningless for the proxy.

So in our case a transparent proxying for HTTPS would have to first intercept a TCP connection intended for an external internet IP for port 443 and re-route that to some intermediate proxy program (which we shall design) intended to do the proxy handshake using the CONNECT method and once the raw TCP connection has been established, blindly forward packets so that the client is unaware that it is actually using a proxy and thinks that it is directly communicating with the HTTPS server. Making the CONNECT request is trivial, but the only problem we face now is in finding out who the original intended host was?

This problem luckily has a simple solution! The linux netfilter/iptables luckily add’s the original destination’s IP address to the tcp socket as an extra socket parameter SO_ORIGINAL_DST. Whenever we redirect packets in iptables using a DNAT to a new destination IP address the original destination IP is accessible using getsockopt with SO_ORIGINAL_DST on the socket to get the original destination.

Using this information I had previously written a simple python proxy forwarder + iptables combo designed to run on my system to do exactly this. You can get this here. To use the script simply run the iptables commands given in comments as root, and then run the python script (normal user suffices). The python script acts as the intermediary proxy to tunnel HTTPS connections through the HTTP proxy by making a CONNECT request and then forwarding packets in the rest of the connection.

So great, it works! But we’re still not done. All right well and done that we got it working , but we really need this for our wifi router! Luckily it being a linux box gives us some hope of being able to do the same thing, except on the router. A router with little under a few MB or ram and storage would not be the ideal device to be running a badly written script in python. Clearly I need a more  lean and mean solution, written in C. The main pain point here would be how on earth to cross-compile this for my router’s CPU. It’s an atheros chipset and hence I would need to cross-compile it for the mips architecture. I struggled a lot (a really lot, trust me) to set up my own mips cross-compiling tool chain from gcc. After a lot of effort I compiled gcc for mips , but I got stuck on some issue of not being able to import anything at all. I suspect I overlooked setting up libc now in retrospect. But anyways I gave up on this entirely in exasperation. This was a while back when https was working in my local network and I didn’t have that much of an inspiration to take the effort. Now that my local network forced me to get this working I finally scourged the net and finally realized that openwrt provided a MIPS toolchain for my chipset which is in fact the same toolchain used at dd-wrt and other places for compiling binaries for my router! (gcc toolchain for MIPS_34KC ar71XX at OpenWrt – Newer version might exist at their download site) . I still had no idea if the SO_ORIGINAL_DST would still work on my dd-wrt router, so first thing I did was code a very short program to test that and lo an behold it works ๐Ÿ˜€ , so now it was simply a matter of translating the python code I had already written into C using Linux sockets. So although a bit mundane to just translate, I went ahead and did just that and copied over the iptables I had used to work on my router for HTTP intercepting and modified the same for port 443. I set it all up on my router and ran it, and Sweet Success  :D. Initially it seemed a bit laggy and slow to setup the HTTPS connections, but it seemed quite usable for most purposes after that! So with it configured to autorun on router-reboot I have a flawlessly working solution which makes all my android applications sync again, my music streaming works again and I’m Happy ๐Ÿ™‚  I’ve added links to the code, toolchain, as well as binaries for my atheros chipset below at the end of this post. Enjoy !

Note:

The approach I have finally used assumes that a working DNS server is accessible directly which can resolve external domain names to their valid ip addresses correctly. If such a DNS server is not accessible it is still possible to hack around this (though I have not attempted it) by using your own custom DNS server which forwards local name requests to the local DNS but intelligently provides unique fake IP addresses for external domain names. Then using the method above we can intercept connections and find the intended destination IP address. The intermediate proxy can communicate with the fake DNS and have a unique mapping of the fake IP-address to the correct original domain name. Then it’s a simple matter of making a CONNECT request to the Hostname instead of the ip addres!

 

Links/Files from post:

HTTPS transparent proxying for a linux desktop (works only for the single system)
Instructions: run iptables given in comments as root, run script as normal user.

OpenWrt toolchain for ar71xx
Instructions: Just extract and use for gcc. use to copmpile C programs to be run on the router

HTTPS transparent proxy code: tproxyhttps.c
Instructions: compile for mips as “mips-openwrt-linux-gcc proxy.c -ldl -lpthread -o tproxyhttps”
This is just my attempt at getting this working , it is not the most efficiently written code and could do better certainly. May degrade performance, but seems reasonably usable in my experience. Use at your own discretion. Avoid excessive logging when running on the router for a long-term.

HTTPS transparent proxy binary/source/router-script : tproxyhttps.tgz
This package includes a pre-compiled binary for mips along with the source code and a startup script onrouter.sh intended to be run on the router giving an idea of what iptables rules need to be set and how to run the binary. For an explanation of the iptables refer to my earlier post part-1.

Relevant files posted at github repository

Advertisements
Posted in Uncategorized
9 comments on “Transparent Pass-through proxy with iptables – Part 2 (for HTTPS)
  1. Bro, thanks very much. I was looking for a similar solution. I have already setup my fake dns server and will try this asap and will reply back. ๐Ÿ™‚

  2. alex says:

    I try to setup your combo python + iptables, but my external proxy needs an authentication with username and password. How can I modify the Python script to include the authentication ? Thanks

    • phinfinity says:

      I’m unfamiliar with how http proxy authentication works, and I don’t have access to any proxy server with authentication enabled at the moment to inspect the protocol. However from a brief look about the protocol I suspect that it should be a simple matter of adding a Proxy-Authorization header with the “username:password” encoded as Base64.
      Have a look at steps 1-3 in the question at http://stackoverflow.com/questions/10023636/http-spec-proxy-authorization-and-authorization-headers
      If you know for sure that the proxy server requires authentication I suspect you could avoid the initial handshake and straight-away issue the Proxy-Authorization header.
      Code modifications for the python script will involve adding the header to the https_conn variable in wrap_https_proxy() with something like:
      ‘Proxy-Authorization: Basic %s’ % base64.b64encode(‘%s:%s’ % (username, password))
      So all in all it will look like “CONNECT host:port HTTP/1.1\r\nHost: hostname\r\nProxy-Authorization ….\r\n\r\n’
      Hope this works ๐Ÿ™‚

      • alex says:

        Oh I see, I think you’re right it can work like this, I will modify the http headers to add authorization in the python program. But for the moment I am struggling to get the destination addr/port with dst = conn.getsockopt(SOL_IP, SO_ORIGINAL_DST, 16).Unfortunately it does not work on my system. I have a Windows machine, an Android emulator running (iptables version 1.3.7) on the same windows machine for testing. I run your python script as-is on my windows machine. On the Android emulator I run the following rules (10.0.2.2 is an alias of my windows machine localhost from the Android emulator view) :
        iptables -t nat -A OUTPUT -d 127.0.0.1 -j ACCEPT
        iptables -t nat -A OUTPUT -d 10.0.2.2 -j ACCEPT
        iptables -t nat -A OUTPUT -p tcp –dport 443 -j DNAT –to 10.0.2.2:1234
        The python script catches well the https connection but dst = conn.getsockopt(SOL_IP, SO_ORIGINAL_DST, 16) generates an exception error. I suspect that iptables does not set SO_ORIGINAL_DST in the socket, or might use another keyword to set the original dest/port ? I am trying to find the original dest/port with another way. Normally iptables forward the http request CONNECT host:port to the python program, I should be able to extract the http header and read the original dest/port without using getsockopt() ?
        Thanks

      • phinfinity says:

        Ah, the problem here I suspect is because the iptables and python script are running on two different systems. The SO_ORIGINAL_DST gets set when iptables redirects and only lasts within the same machine, and when the packet moves from the emulator to your system it gets lost, which is why you are unable to get the original destination.
        For HTTP you can simply read the HTTP headers to find out the original destination (as it is done in my previous post), but for HTTPS since it is encrypted you will not be able to read the headers.
        A solution in this case might be to run the python script as well within the emulator, which might be a bit tricky. This is why I re-wrote the python script into C for MIPS to be able to run on my router as both iptables and the proxyscript needs to run on the same machine. I’ve also heard there are apps for android which do the same iptables+proxyscript together, searching for those might be an alternate option.

  3. alex says:

    oh I understand, yeah, the emulator is using 10.0.2.2 to identify the localhost of the machine even it is hosted on the same machine, I try to use 127.0.0.1 to replace 10.0.2.2 in the iptable rule for DNAT to hack but it does obviouslynot work, worth a try ๐Ÿ™‚
    Well I have no other choice than setting a proxy forwarder program on the emulator, with all the problems I can see in advance ๐Ÿ™‚
    Thank you again for your help, your tutorial and programs are really good and help to understand the https mechanism.

  4. alex says:

    yeahhh, that is the first time I can redirect https from my android emulator to the external proxy with authentication, thanks to your tutorail and program ๐Ÿ™‚
    You know, it is really painful to redirect https packets to a proxy in the emulator for all apps (except for the browser), because Android is dependent of the version, platform and so on, it is really difficult to set a generic proper working proxy forwarder on the emulator, I try a few apps, proxydroid, tproxy, redsocks, all fail, until I find your tutorial, it is simple, clear and working ๐Ÿ™‚
    In my opinion it is the best solution which can work for all android emulator versions, just use iptables for the emulator to redirect packets to an external proxy forwarder like your python script which then handle connections to the upstream https proxy.
    For the moment I remove the getsockopt() and hard code srv_host = “A.B.C.D” and srv_port = 443, so the lost destination addr problem is solved.
    You are good, your code for proxy authentication is working perfectly. I just add the header below and the authentication is working fine. Sure it does not tackle any authentication error, but that is ok for me.
    https_conn = “CONNECT {host}:{port} HTTP/1.1\r\nHost: {host}\r\n” + “Proxy-Authorization: Basic %s\r\n\r\n” % base64.b64encode(‘%s:%s’ % (PROXY_USERNAME, PROXY_PASSWORD))
    I read the note at the end of the tutorial. You talk about fake DNS server and find the destination address, I don’t understand well what you mean but it gives me an idea. Do you think I can use such a mechanism of fake DNS to find the destination IP address instead of using getsockopt(), and just assume the port is 443 ? If true, how can I do that in the current python program ?
    Thanks

  5. alex says:

    I can finally make everything working like a charm with the help of your python program and iptables rules. I can include https proxy and also socks proxy with or without authentication, I am very happy. I have one question related to performance if you don’t mind. In your code of the function pipe_data(), if data has 0 length all connection are closed, but if the application client has no activity for some time, I guess data sent by the client is also 0 isn’t it ? if the client starts to send data after this pause new connections will be initiated again. Why not keeping all the sockets alive during this pause ? :
    while True:
    data = s_from.recv(2048)
    if len(data) == 0:
    return
    s_to.send(data)

  6. Johannes Martin says:

    Thanks a lot this! I have been looking for something like this for a long time.

    I had to make two minor modifications to your C code to make it work for me:
    1. Calls to parse_commandline() and init() have to be switched around for custom proxy IP addresses to work.
    2. Squid 3.4.8 replies with HTTP/1.1 rather than HTTP/1.0, so the code would think the reply was wrong and not establish the connection.

    During my tests I ended up with a “too many open files” message in the accept call, so I suspect in some error condition some handles aren’t closed correctly. I haven’t found out where yet though.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: