Below are some Python inserts to enable yara scanning of in-memory objects while parsing something, like a PDF. This particular example enables Yara signature scanning of parsed, filtered PDF objects via Didier Steven's PDF-Parser.
[...imports...]
import yara
import mmap
rules = yara.compile('path/rulefile')
[...parsing code...]
############################ yara insert around line 558
############################ just before
######### print ' %s' % FormatOutput(filtered, options.raw)
memmap=mmap.mmap(-1,len(filtered))
memmap.write(filtered)
memmap.seek(0)
matches = rules.match(data=memmap.read(len(filtered)))
memmap.close()
for m in matches:
__ print ' yara: %s' % (m)
##################################
[...resume Didier's code...]
print ' %s' % FormatOutput(filtered, options.raw)
1 comment:
I codified it and made a couple other fixes. Here is the patch to enable yara scanning for Didier's PDF Parser v 0.3.7
35a36
> 2010/10/12: Added YARA bindings
51a53,54
> import yara
> import mmap
68a72,76
> rules = yara.compile(filepaths={
> 'sig1':'embedded_object_signature_set',
> 'sig2':'document_structure_signature_set'
> })
>
556a565,566
> if options.yara:
> YaraScan(filtered)
558a569,570
> if options.yara:
> YaraScan(object.Stream(False))
564a577,588
> def YaraScan(data):
> memmap=mmap.mmap(-1,len(data))
> memmap.write(data)
> memmap.seek(0)
> matches = rules.match(data=memmap.read(len(data)))
> memmap.close()
> if matches != []:
> print "<< YARA RESULTS"
> for m in matches:
> print ' yara: %s' % (m)
> print ">>"
>
621c645,649
< return binascii.unhexlify(''.join([c for c in data if c not in ' \t\n\r']).rstrip('>'))
---
> try:
> return binascii.unhexlify(''.join([c for c in data if c not in ' \t\n\r']).rstrip('>'))
> except:
> ### Added to fix unhexifly fail on miscoded malware
> return binascii.unhexlify(''.join([c for c in data if c not in ' \t\n\r']).rstrip('>\x00'))
744a773
> oParser.add_option('-y', '--yara', default=False, help='scan with yara')
Post a Comment