Scraping IMDb Futurama Episode User Ratings

Good news, everyone!

This entry is effectively a two-fer. It will show how I used some basic tools and a pinch of Python with numpy to get some of the data I needed for a class project. I took a look at my favorite television show, Futurama. I used the average Internet Movie Database (IMDb) user rating for each episode to see how many standard deviations away from the mean the top four episodes are. The ultimate goal of the project was different but this was a good way to use data to support facts.

Quick. Dirty. Scraping.

IMDb has a page with every Futurama episode and it’s average user rating. The URL is http://www.imdb.com/title/tt0149460/eprate?ref_=ttep_sa_2. Note that the direct-to-video movies are excluded (rightfully) from this list.

Let’s scrape that data!

$ curl -v http://www.imdb.com/title/tt0149460/eprate?ref_=ttep_sa_2 2>&1 | egrep -i 'users rated this' | cut -d' ' -f5 | cut -d'/' -f 1 > /tmp/scores.txt

Oddly enough the OSCP labs had me scrape this way frequently. I didn’t have the time to push the labs hard or take the practical but some information stuck. I’ll hopefully get back to that OSCP soon. 🙂

The above gives us a file (/tmp/scores.txt) with each Futurama episode user score on a new line. All I really want is the mean and standard deviation anyway — It’s easy to do with the Python interpreter.

>>> import numpy as np
>>> scores = []
>>> for line in open('/tmp/scores.txt', 'r'):
...   scores.append(float(line.strip()))
... 
>>> scores = np.array(scores)
>>> np.mean(scores)
7.8798387096774185
>>> np.std(scores)
0.58612723471593964

The mean is ~7.88 and the standard deviation is ~0.59 — I used this information to compare the top four episodes. (The highest rated episode is 2.745 standard deviations away from the mean!)

For those interested here’s a screenshot of the above in action:

Screen Shot 2017-03-28 at 7.30.45 PM

Scraping IMDb Futurama Episode User Ratings

Amazon Echo Dot System Image

My friend sent me a second generation Amazon Echo Dot as a holiday gift. It sounded like a good little opportunity to get the ball rolling on some modeling for RiotPSA. Plus I really wanted one!

I began by capturing Dot network traffic just to see what’s going on. I set up a port mirror on a little 5 port switch to capture all of the Dot traffic. I ended up using tshark for ring buffers. The goal was to capture the initial setup of the Dot and then about an hour of no activity. I would keep the Dot connected but I would not say the Dot trigger word. (The default is Alexa.) After only a few minutes I had captured over 250MB! Something fun was going on.

My suspicion was that the Dot downloaded and applied a system update. This turned out to be true.

Using Wireshark

After the hour of capturing I used Wireshark to do a quick analysis of the data. I gravitate to sorting and filtering. I sorted conversations by bytes in Statistics > Conversations. The majority of IPv4 data (about 270MB) was between the Dot and 72.195.165.64.

wireshark_topconI filtered by right-clicking the conversation and selecting Apply as Filter > Selected > A <-> B. In the Wireshark packet list pane I checked out the traffic and decided to filter on the Stream Index of 29.

tcpstream29A nice, clean conversation. 🙂 There was a successful three-way handshake between the Dot and the remote host (72.195.165.64) followed by an HTTP GET request for a file named update-kindle-full_biscuit-272.5.6.4_user_564196920.bin. (I found out that 564196920 is the software version number from https://www.amazon.com/gp/help/customer/display.html?nodeId=201602210.) I copied the full request URI from frame 3546 (http://amzdigitaldownloads.edgesuite.net/obfuscated-otav3-9/1b9718ec2da663bb299676df977055c9/update-kindle-full_biscuit-272.5.6.4_user_564196920.bin) and used wget to download the file.

wgetIt worked! I didn’t have to specify a user agent, spoof any information, etc. (And now you have a way to download the system image for analysis as well!)

I wanted pull the same file out of the pcapng file. I exported the stream using right-click > Follow > TCP stream > Show data as RAW (drop-down list) > Save as… and saved it as ~/tcpstream29.raw. This export should include the HTTP GET request. xxd confirms it’s presence.

xxd_tcpstream29_1At this point we just want the file. The offset we’re looking for is easy to pick out due to a pretty recognizable file signature shortly after the ASCII keep-alive. (0x504b0304) A carver would be handy but the PK allows me to eyeball it.

xxd_tcpstream29_2I used a hex editor (Bless) to dump everything before offset 0x243.

bless_tcpstream29It’s dead simple. Highlight what we want to remove and hit delete. I saved it as ~/tcpstream29_edited.raw. This file should be identical to the one I downloaded earlier with wget, which I verified by hashing each and comparing them. In case it helps the hashes are below:

  • SHA1(update-kindle-full_biscuit-272.5.6.4_user_564196920.bin)= e897fef9384220cb60bd6f385c328f57408cd5f5
  • SHA512(update-kindle-full_biscuit-272.5.6.4_user_564196920.bin)= cc92c85e08ce412dfbe14562e8df76cdd600da60d3f9245decabb9d65e92b473d07db11559a8b2ffe56e525d0050245dfd0d2c1d0dd23e47d14dee9dd911b01a

Inside

I loaded the file into X-Ways so I could explore, filter, comment, etc. This quickly led to enough information that warrants its own post. Because of that I won’t go over everything here but will instead save it for a separate post. It is worth noting some things that stuck out from the get-go:

  • The Dot runs Android or at least a modified version of it.
  • There’s a bash script for iptables firewall configuration. There is an initial flushing of the tables and a default deny is put in place after. I will look at the rules and scan against them to verify that the script is ran once I enter a more active phase of information gathering.
  • The Dot also implements bluetooth blacklisting to specifically prevent automobiles from automatically pairing with it.
  • The system update comes with two firmware packages for integrated components. I’ll run these through strings, binwalk, and bulk_extractor to see if anything fun comes up.

Again – I’ll go over these items (and more) in follow-up posts.

Thoughts of an Echo Dot compromise

The idea of fully compromised an Amazon Echo Dot crossed my mind. Here are a few thoughts I have.

Alter the system update to suit your goal

Altering the firewall bash script and including binaries in a repacked copy of the update is possible. If the Echo Dot does not require that system images be digitally signed then the system update has a chance at being loaded/ran on an Echo Dot if presented correctly. I don’t know if the system packages are digitally signed of if signing is required. Absence of evidence is not evidence. Right now I just do not know because I haven’t looked.

Trick the Echo Dot into downloading and running a bad update

I believe I traced down the TCP conversation in my capture file that alerted the Echo Dot that an update was available and where to find it. I started at the tcp.stream of the system update download itself and worked backwards using the IP addresses, DNS queries/answers, etc. Unfortunately (or fortunately really) the TCP stream of interest contains an encrypted conversation. I can not see inside it to verify.

Effort and insight would be required to discover how the Echo Dot is specifically told an update exists. More would be needed still to manufacture a way to trick an Echo Dot into downloading and running a system image not created and made available by Amazon. Later on I’ll be using The OWASP Zed Attack Proxy (ZAP) and the Burp Suite to work on this.

A vulnerability in a listening process

The Echo Dot has a firewall with a deliberate set of rules. Bugs are organic to the development process and it’s possible that a permitted, listening process has a vulnerability. Keep in mind that the thoughts above are anything but unique to the Amazon Echo Dot. This is basic stuff.

Why so much has not been done

I’d love to explore the system update more but the spring semester of college just started and my hands are full. Being an older, non-traditional student means that I’m taking classes a little more seriously and likely putting unnecessary pressure on myself. I’m taking 19 credits and need to focus on starting the semester on the right foot!

Hopefully in the coming weeks I’ll have more time and resources to look more into this system and network traffic. Until then I realized that there wasn’t any reason to not share the system update, specifically the download link. I couldn’t find it on Google so I figure it just hasn’t been posted yet.

I’ll be sure to update the blog when I explore the system update and network traffic some more. Thanks!

Amazon Echo Dot System Image

A Quick Note

I haven’t forgotten about this blog or stopped pursuing topics to blog about. I haven’t posted recently (and likely won’t for about another ten days) because I attended CactusCon 2016 and now have to deal with finals. It’s my first semester back in school and I want to give them my full attention.

Thanks for understanding! More fun stuff will be here soon. 🙂

A Quick Note