Monday, August 10, 2015

PFP - A Python Interpreter for 010 Templates

I am excited to finally announce a project I have been slowly working on for at least five months now: pfp (docs).

PFP stands for Python Format Parser and is a python-based interpreter for Sweetscape's 010 Editor Templates. 010 editor PFP takes an input stream and an 010 editor template and returns a modifiable DOM of the parsed data:

#!/usr/bin/env python
# encoding: utf-8

import os
import pfp
from pfp.fields import PYSTR,PYVAL
import sys

template = """
    BigEndian();
    
    typedef struct {
        // null-terminated
        string label;

        char comment[length - sizeof(label)];
    } TEXT;

    typedef struct {
        uint length<watch=data, update=WatchLength>;
        char cname[4];

        union {
            char raw[length];

            if(cname == "tEXt") {
                TEXT tEXt;
            }
        } data;
        uint crc<watch=cname;data, update=WatchCrc32>;
    } CHUNK;

    uint64 magic;

    while(!FEof()) {
        CHUNK chunks;
    }
"""

png = pfp.parse(
 data_file="~/Documents/image.png",
 template=template,
)

for chunk in png.chunks:
 if chunk.cname == "tEXt":
  print("Comment before: {}".format(chunk.data.tEXt.comment))
  chunk.data.tEXt.comment = "NEW COMMENT"
  print("Comment after: {}".format(chunk.data.tEXt.comment))

with open("/tmp/test.png", "wb") as f:
 png._pfp__build(f)

The above example will use the simple PNG template to parse a png image and change the comment, while keeping length and checksum values correct.

For those who are completely unfamiliar with 010 editor templates, 010 templates parse data by declaring variables. Every variable that is declared (unless prefixed with const or local) parses that amount of data from the input stream. For example. declaring a four-byte character array will parse four bytes from the input stream and display it as a character array.

Installation

PFP can be installed via pip:
pip install pfp

Motivation

My main motivation for writing pfp was to be able to use the large number of already-existing 010 templates from python. The 010 editor GUI is great to do simple modifications, but it does not expose an api and does not have a way (that I know of) to auto-update length calculations, checksums, or parse compressed/encoded data. I used to think that 010 editor was only available on Windows, but I have recently found out it is available on Mac and Linux as well.

PFP has added some extensions to the standard 010 Editor special attributes (what I call metadata in pfp) to allow fields to auto-update their value based on the values of other fields. Metadata extensions also exist in PFP to pack/unpack structures within compressed or encoded data.

Read more about metadata in pfp in the metadata documentation.

Uses

  • Fuzzing
  • General data format modification
  • Data format visualization
  • etc.

Implementation

010 template scripts use a modified C syntax. The main differences are that it allows control-flow statements within struct declarations, and that metadata attributes can be declared as part of a declaration:
struct {
 uchar len<watch=data,update=WatchLength>;
 if(len == 2) {
  short data;
 } else {
  char data[len];
 }
} some_struct;
The first step to implementing PFP was to create an 010 template parser. Since the syntax is so similar to C's syntax, I forked Eli Bendersky's pycparser project and modified it to be able to parse 010 templates. The result is py010parser.

py010parser returns an abstract syntax tree (AST) after parsing a template, which pfp then interprets by iterating over every node in the AST. Writing the interpreter was surprisingly easy, if a tad tedious. I had gotten some inspiration for how to set things up from how firefox's and chrome's javascript interpreters work.

One of the benefits to having the interpreter written in python is that you can now expose native python functions to 010 templates:
from pfp.native import native
from pfp.fields import PYVAL

@native(name="Sum", ret=pfp.fields.Int64)
def sum_numbers(params, ctxt, scope, stream, coord):
        res = 0
        for param in params:
                res += PYVAL(param)
        return res

The sum_numbers python function will be callable from templates as the Sum function. See the functions documentation for more specifics.

Debugger

As I moved from simple template scripts to more complicated ones, it became increasingly difficult to debug errors in my interpreter without an 010 template debugger. So I wrote a template debugger using one of my favorite python modules, the cmd module (one of my other recent-favorites is the sh module):

pfp debugger

You can drop into the interactive debugger by calling Int3() anywhere in a template script. See the debugger documentation for more details.

Vim Plugin

Since vim is my editor of choice (and probably what hackerman uses), I wrote a vim plugin (pfp-vim) to visualize data formats using pfp:
pfp-vim plugin

pfp-vim exposes two commands:
  • :PfpInit - creates ~/.pfp with info about where your templates are stored
  • :PfpParse - parses the current buffer using the template that you choose

Reliability, Bugs, and Testing

I am making a strong effort to have pfp be as stable and reliable as possible. There are currently 110 test cases for the features in pfp. If/when you have a problem with pfp, please submit an issue on github. Pull requests are also always welcome.

laters,

--d0c

mr. monk doing a jig

1 comment:

  1. Wow this is perfect! I will definitely check this out. I was actually thinking of doing that on my own, having recently discovered the great 010 Editor :)
    I was reluctant to spend time on writing 010 Templates, as they are of very limited use (only in 010 Editor), but being able to use them from python makes them infinitely more valuable.

    Are you planning on creating an Emacs plugin, too? ;)

    ReplyDelete