There are many ways to determine what someone is doing online by analyzing their network traffic. Capturing network traffic and using Wireshark dissectors and statistics can even help when a large amount of the traffic is encrypted. A domain name system (DNS) query alone could provide enough information to act on. I had talked about determining what someone is doing online with non-technical friends before. This past month I received a text message from a one of them who essentially asked the following: Do I still need to use a virtual private network (VPN) if I select ‘require encrypted peers’ in my BitTorrent client? I had my answer (don’t illegally download anything and yes, use the VPN) but I didn’t have a technical reason for BitTorrent traffic specifically. I decided to experiment so that I could provide that technical reason and to have a more robust understanding of what the BitTorrent protocol is doing.
I regularly use BitTorrent for legal downloads. When Offensive Security released an updated Kali Linux 2017.2 virtual machine image this week, I used BitTorrent to download and share it. Another distribution I download this way is Raspbian. I decided to use qBittorrent, a Windows BitTorrent client, to test what using encryption among peers actually means.
The default peer encryption setting is located in Preferences > BitTorrent > Encryption mode: and has the default value of Prefer Encryption. Below is a screen shot.

Control, Capture, Analyze
I will be doing null hypothesis testing and will change the peer encryption option for two separate packet captures (pcaps) and compare them. Instead of using the default setting I will use disable encryption and require encryption. Using each configuration I will download the Raspbian system image and use hashing to ensure they result in an identical file. I will use a set of bytes from the file to search for it being in each packet capture as clear text. The null hypothesis (H0) is that the download exists in both pcaps in clear text. My hypothesis (H1) is that the download does not exist in the require encryption pcap as clear text. I am hoping to use Wireshark to reject the null hypothesis.
Using tcpdump and a port mirror I captured the network traffic and ended with these two pcaps:
filename, size, sha1
bt_disable_encryption.pcap, 432.4 MB, 65586ad1b23aeaccddbf17c99a7611e0f94c8e97
bt_require_encryption.pcap, 421.6 MB, 4c0024e9aa4ca9fcc0aadf4d9e8903e864e5a6b7
The resulting download from each was identical (as expected) and verified through hashing. I used the first 16 bytes from the download for filtering. I wouldn’t use this as a filter when given a large pcap, but mine is small and was a targeted capture. If I use too few bytes as a filter, I may be filtering for any *.zip file signature. If I use too much data, I may exceed the size of a payload in a single frame. I use xxd for a continuous byte dump and sed to put it in the format that tshark expects for the filter.
$ xxd -p -c 16 ~/2017-08-16-raspbian-stretch-lite.zip | head -n 1 | sed 's/\(..\)/\1:/g' | sed '$s/.$//' 50:4b:03:04:14:00:00:00:08:00:74:5c:10:4b:da:96 $ tshark -r ~/bt_disable_encryption.pcap frame contains 50:4b:03:04:14:00:00:00:08:00:74:5c:10:4b:da:96

We can see that frame 241939 contains the hex we filtered for. Opening the pcap in Wireshark and filtering on the frame allows us to see the bytes in the payload.

This is good news and what we expected. In the screen shot below we can see that using the same filter against the bt_require_encryption pcap also yields good results; it does not find the bytes in any of the frames.

I am not testing the payloads for entropy or signs of encryption, so the most correct statement I can make is that the download is not being sent as clear text when require encryption is selected in qBittorrent. This distinction is important, as it more accurately describes what I am doing. As a result, we reject the null hypothesis. The bt_require_encryption pcap does not contain the file in clear text.
Results
With the testing, I could determine two facts:
- When disable encryption is selected the data is transmitted in clear text.
- When require encryption is selected the data is not transmitted in clear text.
It was nice to see this in action but I wanted to explore the pcaps to see if there was more information to be had. All I looked for previously was the clear text transmission, which is only a part of the process of sharing data securely. I wanted to answer a new question:
Does using encrypted peers prevent an eavesdropper from determining the payload using only the BitTorrent protocol traffic? (Hint: No!)
Exploratory surgery
I decided to look at each pcap using Wireshark. I’m always amazed at how much information the built-in dissectors can parse out. I started noticing BitTorrent handshake traffic between peers. This appears to occur when peers communicate for the first time after completing a TCP three-way handshake. Both peers send a BitTorrent handshake that contains, among other data, a SHA1 hash of the info dictionary and a Peer ID. Below is a screen shot of a sample from the bt_require_encryption pcap.

Peer ID
The Peer ID is a 20-byte value that is unique to each peer. According to the BitTorrent Protocol Specification, there are currently no guidelines for generating a Peer ID. The observed Peer IDs from both pcaps appeared to be in hex and it’s possible that a quick SHA1 hash of random input is used for generation.
SHA1 Hash of the Info Dictionary (info_hash)
The SHA1 Hash of the Info Dictionary was identical and present in both the bt_disable_encryption and bt_require_encryption BitTorrent handshakes. This a huge red flag and peaked my interest. It makes sense that peers need to verify that they will exchange the correct data, but they appeared to be making this negotiation outside of any encrypted stream.
To better understand the meaning of the info_hash I wanted to find out what input is used for generation. It isn’t a hash of the *.torrent file itself. The info_hash is contained within the *.torrent file, the BitTorrent magnet uniform resource identifier (URI) link, and it isn’t a hash of the file(s) to be exchanged. After some searching, the BitTorrent Protocol Specification helped by defining it as a 20-byte SHA1 hash of the value of the info key from the metainfo file. The value of the info key from the metainfo file is a bencoded dictionary. I was able to parse out the bencoded dictionary and re-create the info_hash for the Raspian torrent with some simple hex editing.
Knowing that BitTorrent traffic with the peer encryption required setting enabled still transmits the info_hash in clear text, I decided to see if I could accurately determine encrypted BitTorrent traffic contents by simply entering in the corresponding info_hash on Google and browsing the top results. This worked.
The Process
Using tshark and bash I can filter info_hash values from BitTorrent handshakes. I could then query Google using those info_hash values and use the results to estimate the payload. I chose to automate this with Python 3 and BeautifulSoup. Below is a screen shot of output using the bt_require_encryption pcap as input:

This proof of concept works against traffic with either encrypted or unencrypted BitTorrent peers.
Conclusion
You have a high likelihood of determining the contents of encrypted BitTorrent traffic by collecting info_hash keys and using public databases. For proof of concept I have created a Python script to automate looking through pcaps and returning the first five Google results of any info_hash present in the traffic.
To my friend: Keep using your VPN. 🙂