Showing posts with label Python. Show all posts
Showing posts with label Python. Show all posts

Wednesday, May 4, 2016

Read from file: Length-prefixed protocol buffers

File format

In short it’s binary encoded protobuf message prefixed with the size. For longer description please check Length-prefix framing for protocol buffers.

In most languages the file is read as stream. If you need to open and parse a file which holds lenght-prefixed protobuf messages in Python, you read internet and you find a code which read the whole file. It’s huge bottleneck. The hit in a performance between reading whole file then parsing byte after byte and using BufferedReader was in 1000x. Thus enjoy my little piece of code:

def ReadItm(fname, constructor, size_limit = 0):
    ''' Reads and parses a length prefixed protobuf messages from file. 
        The file MUST not be corrupted. The parsing is equivalent to parseDelimitedFrom.
    '''
    f = None
    if fname.endswith('.gzip'):
        f = gzip.open(fname, 'rb')
    else:
        f = open(fname, 'rb')
    reader = BufferedReader(f)
    bytes_read = 0
    while size_limit<=0 or bytes_read<size_limit:
        buffer = reader.peek(10)
        if len(buffer) == 0:
            break
        (size, position) = decoder._DecodeVarint(buffer, 0)
        reader.read(position)
        itm = constructor()
        itm.ParseFromString(reader.read(size))
        bytes_read = bytes_read + position + size
        yield itm
    f.close()

Thursday, May 28, 2015

pep8

I've got very comfortable with clang-format for C++ and gofmt for Go. Every time I have to use some other language i'm missing them very much. Today I've found a solution for Python. It makes life so much easier. The two awesome tools are:

pep8

pep8 is a tool to check your Python code against some of the style conventions in PEP 8.
https://pypi.python.org/pypi/pep8

autopep8

autopep8 automatically formats Python code to conform to the PEP 8 style guide. It uses the pep8 utility to determine what parts of the code needs to be formatted. autopep8 is capable of fixing most of the formatting issues that can be reported by pep8.
https://github.com/hhatto/autopep8