Seeing the Light

There is light at the end of the tunnel.

So much has happened over the past few years that it’s odd to think about closure paired with continued success. With any luck I’ll be wrapping up my masters program at the end of the year and entering my second year of employment at Kroll. I’ve also been remodeling my home. This isn’t a humblebrag post. If you enjoy doing something it doesn’t automatically become easy. This year has been difficult and not unilaterally successful. Learning experiences? Yes. Fuck-ups? Also yes. Running water? Occasionally. At least no one died as a result.*

I’m excited to get back to punk-rock computing: Using my free time to research/test what I want, how I want, and blog about. That’s why this post exists. I’m kicking the rust off the ol’ WordPress install to make sure it works (I’ve been paying someone to maintain it… did they?) and that I know how to hit the “Publish” button with just the right amount of intensity. (About 4.63 intensities or more.)

There are many new tools I use since I’ve last posted that I’d love to post about. Python 3 + pandas, KAPE, and DeepBlueCLI, etc. I look forward to posting how I use them, what I use them for, and what I think the future of DFIR could look like. Also just about my life, what’s on my mind, and a few links to weird websites that remind me of how the Internet was in 1996. Webrings, anyone? More posts soon!

* That I am aware of.


Seeing the Light

Disruptive Technology Theory

Disruptive Technology Theory has come up frequently in my coursework and it is largely misunderstood or errantly attributed to firms or ideas that are successful. There’s a collection of articles I will post soon (I’m finishing up a summer class at the University of Arizona this week!) but thought this podcast episode warranted a share on it’s own. Here is a link and description:

The Disruptive Voice, episode 15: Is Uber Disruptive?

Is Uber disruptive? We asked five experts on the theory of disruptive innovation this question and received varying responses, yet their prescriptions for what lies ahead for Uber and the incumbent taxi companies vary less than you might think. In this episode, we revisit Professor Clay Christensen’s December 2014 article in Harvard Business Review, “What Is Disruptive Innovation,” with co-author Rory McDonald, Innosight Managing Partner Scott D. Anthony, Christensen Institute co-founder Michael B. Horn, and Forum Senior Researchers Tom Bartman and Efosa Ojomo. Also discussed: the platform business model through the lens of disruptive innovation and what’s next for Uber.

Disruptive Technology Theory

Using ScanSnap Manager to OCR non-ScanSnap PDFs

I had some PDFs that I wanted to perform optical character recognition (OCR) processing on. I have a Fujitsu ScanSnap and wanted to use the ScanSnap Manager software to do this. The management software checks supplied PDFs and will only perform procession on those which originated using ScanSnap hardware. I wanted to circumvent this and it ended up being easy.

PDFs created with a ScanSnap have the Exif tag “creator” with the model string value. You can use ExifTool by Phil Harvey to print and modify Exif data. For example:

$ exiftool -creator ~/example.pdf
Creator                         : ScanSnap Manager #iX500

The file example.pdf has the correct tag/value pair and will be processed. The next file, covfefe.pdf, does not. You can add/modify the tag to the PDF which did not originate from a ScanSnap.

$ exiftool -creator="ScanSnap Manager #iX500" ~/covfefe.pdf 
    1 image files updated
$ exiftool -creator ~/covfefe.pdf
Creator                         : ScanSnap Manager #iX500

Voila! The ScanSnap Manager software will now process the PDF. You can certainly use free OCR software but I didn’t find any of them to be quite a slick. Plus this was more fun. 🙂

Using ScanSnap Manager to OCR non-ScanSnap PDFs

Welcome to the World of Tomorrow! (Again)

I wanted to follow-up on my previous post about scraping Futurama episode ratings from IMDb. I used tools I was familiar with to get the job done but I was told by someone that I really should check out BeautifulSoup to do it all in Python. It ended up working great and I’ll continue to use BeautifulSoup for web scraping in the future. This is what I did in the IPython interpreter:

import re, requests
import numpy as np
import pandas as pd
import scipy.stats as stats
from bs4 import BeautifulSoup

# create soup object

r = requests.get("")
soup = BeautifulSoup(r.content)

# scrape scores

scores = []
for score in soup.find_all("td", {"align": "right", "bgcolor": "#eeeeee"}):

# scrape episodes

titles = []
for title in soup.find_all("a", {"href": re.compile("\/title\/tt")}):
    if len(title["href"]) == 17:

cols = ["IMDb Rating"]

# build dataframe

frame = pd.DataFrame(scores, titles, cols)

# maths with numpy


# maths with pandas

s = pd.Series(scores)


# test for normal distribution


The ISTA 350: Programming for Informatics Applications course at the University of Arizona helped me a lot after my initial post. Additionally, the book Web Scraping with Python by Ryan Mitchell is one I’d recommend keeping handy.

Welcome to the World of Tomorrow! (Again)

Scraping IMDb Futurama Episode User Ratings

Good news, everyone!

This entry is effectively a two-fer. It will show how I used some basic tools and a pinch of Python with numpy to get some of the data I needed for a class project. I took a look at my favorite television show, Futurama. I used the average Internet Movie Database (IMDb) user rating for each episode to see how many standard deviations away from the mean the top four episodes are. The ultimate goal of the project was different but this was a good way to use data to support facts.

Quick. Dirty. Scraping.

IMDb has a page with every Futurama episode and it’s average user rating. The URL is Note that the direct-to-video movies are excluded (rightfully) from this list.

Let’s scrape that data!

$ curl -v 2>&1 | egrep -i 'users rated this' | cut -d' ' -f5 | cut -d'/' -f 1 > /tmp/scores.txt

Oddly enough the OSCP labs had me scrape this way frequently. I didn’t have the time to push the labs hard or take the practical but some information stuck. I’ll hopefully get back to that OSCP soon. 🙂

The above gives us a file (/tmp/scores.txt) with each Futurama episode user score on a new line. All I really want is the mean and standard deviation anyway — It’s easy to do with the Python interpreter.

>>> import numpy as np
>>> scores = []
>>> for line in open('/tmp/scores.txt', 'r'):
...   scores.append(float(line.strip()))
>>> scores = np.array(scores)
>>> np.mean(scores)
>>> np.std(scores)

The mean is ~7.88 and the standard deviation is ~0.59 — I used this information to compare the top four episodes. (The highest rated episode is 2.745 standard deviations away from the mean!)

For those interested here’s a screenshot of the above in action:

Screen Shot 2017-03-28 at 7.30.45 PM

Scraping IMDb Futurama Episode User Ratings

The Amazon Echo Dot has a script

The script’s existence is not proof that it is used but the expanded speculation around it is a fun exercise. The only fact I can give about the /bin/ script is that it exists. (For now.)

I can say something different for another script,, on the system. I audited against it with nmap. I enabled features on the Dot (such as using Spotify to open TCP 4070) to test the script’s execution/logic. The ability to audit the script and observe behavior is crucial. The data supports that the script is used. (More would be better!) The images below are part of that audit; TCP 4070 being open after enabling Spotify and then a quick banner grab.

Unfortunately I’m unable to do the same level of observing with the script. I can’t knowingly trigger it, I don’t have a way to image an Amazon Echo Dot, and I don’t have a way to remotely connect and monitor it’s activity. The script appears to create new memdump logs in the /data/system/dropbox directory. I would love to know the fate of these logs and anything else in the /data/system/dropbox directory.

If you want a copy of the script you can download the system at Amazon Echo Update 567200820 (And where to download it!). Discovery of this script and other fun within the system happened late last year/early 2017. It’s been fun. 🙂

It’s worth noting that recently ArsTechnica ran the story of Amazon refusing to hand over data on whether Alexa overheard a murder, which puts a good perspective on information one could get (possibly) from Amazon about an Echo Dot user if they were motivated to do so. It’s a continuation of the involvement of an Echo in a murder case from 2016.

I wish I had more time to work on this system. Unfortunately taking 19 credits this semester has proven to be the challenge I was expecting. It’s something I still give attention but not at the level of intensity I would like. Hopefully this summer I can focus on it quite a bit more.

The Amazon Echo Dot has a script