| Store | Cart

Re: trouble doing regex in file containing both ascii and binary content

From: <sisy...@optusnet.com.au>
Sat, 15 Feb 2014 22:33:23 +1100
Hi Greg,

This list is all but dead – it may be that you and me are the only people receiving mail from it.
Much better, IMO, to post these types of questions to perlmonks.

Anyway ... this might help:

#################################
use strict;
use warnings;

my $str = "\x1F\x8B\x08";

print "String contains: $str\n";

open WR, '>', 'file.bin' or die $!;
binmode WR;
print WR $str;
close WR or die $!;

undef $/;

open RD, '<', 'file.bin' or die $!;
binmode RD;
my $contents = <RD>;
close RD or die $!;

if($contents =~ /$str/){print "ok 1\n"}

# To safeguard against presence of
# metacharacters in $str:

if($contents =~ /\Q$str\E/){print "ok 2\n"}
######################################

Cheers,
Rob

From: Greg VisionInfosoft 
Sent: Saturday, February 15, 2014 9:41 AM
To: Perl...@listserv.ActiveState.com 
Subject: trouble doing regex in file containing both ascii and binary content
i cant figure out what im doing wrong here. 
i ran wireshark to monitor a small http client/server query/response.
point of exercise is to see exactly what an ajax response looks like (as im trying to learn ajax).

unfortunately, the ajax response is sent from server in 'gzip' format (not plain text).

so wireshark shows two standard http headers and at the end of the stream is the binary 'gzipped' small stream.

ive saved this wireshark tcp 'stream' to a file.  viewing the file in hex mode, i see clearly the first three binary bytes of the gzipped stream are hex1F hex8B hex08

what i need to do next is save just the binary gzipped stream to a stand alone file, then see if i can un-gzip it to read the plain text contents.

in theory, a straight forward task.

i write a quick few line perl script, whereby i open the saved wireshark tcp stream file, set this input file to binary mode (so as to not change any internal binary byte values), undefine the input line seperator (to upserp the entire file into memory when read), read the file to upserp its contents into a var, do a simple pattern match of \x1F\x8B\x08, then save the matched pattern $& and what follows the match $' to a new file... (right now the script doesnt actually yet output to a file, it just dumps to screen)

for reasons that elude me, the pattern match fails.

i know the 3 bytes are in the file, yet the pattern match to those 3 bytes fails.

any ideas?

heres the small script.

open(IN, $ARGV[0]) || die "cant open input file";
binmode(IN);

undef $/;

my $data = <IN>;

if ($data =~ /\x1F\x8B\x08/) {
  print "matched: " . $& . $';
} else {
  print "no match\n";
}


the contents of the wireshark stream is as follows...


POST /ajax/demo_post.asp HTTP/1.1
Host: www.w3schools.com

Connection: keep-alive

Content-Length: 0

User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36

Origin: http://www.w3schools.com

Accept: */*

Referer: http://www.w3schools.com/ajax/tryajax_post.htm

Accept-Encoding: gzip,deflate,sdch

Accept-Language: en-US,en;q=0.8

Cookie: ASPSESSIONIDAASDBBTC=BFEPJKCDLGDHEEOJIKANOEHP



HTTP/1.1 200 OK
Cache-Control: private,public

Content-Type: text/html

Content-Encoding: gzip

Vary: Accept-Encoding

Server: Microsoft-IIS/7.5

X-Powered-By: ASP.NET

Date: Fri, 14 Feb 2014 21:03:48 GMT

Content-Length: 201


...........`.I.%&/m.{.J.J..t...`....@.........iG#).*..eVe]f...@......{....{....;.N'...?\fd.l..J...!....?~|.?"......&.V.6_..U..u...y...t........./_.I.y;.f..wWG.qBo..
.Q.www.....~..h......./......h.c...


note; the binary data at end is obviously not easily discerned here in ascii mode.  when i open this same file in a binary editor the actual binary contents (displayed in hex) is as follows... (ive inserted an extra space to make the hex values be easily discerned).

1f 8b 08 00 00 00 00 00 04 00 ed bd 07 60 1c 49 96 25 26 2f 6d ca 7b 7f 4a f5 4a d7 e0 74 a1 08 80 60 13 24 d8 90 40 10 ec c1 88 cd e6 92 ec 1d 69 47 23 29 ab 2a 81 ca 65 56 65 5d 66 16 40 cc ed 9d bc f7 de 7b ef bd f7 de 7b ef bd f7 ba 3b 9d 4e 27 f7 df ff 3f 5c 66 64 01 6c f6 ce 4a da c9 9e 21 80 aa c8 1f 3f 7e 7c 1f 3f 22 1e af 8e de cc 8b 26 9d 56 cb 36 5f b6 e9 55 d6 a4 75 fe 8b d6 79 d3 e6 b3 74 dd 14 cb 8b b4 9d e7 e9 cb 2f 5f bf 49 17 79 3b af 66 e3 c7 77 57 47 bf 71 42 6f be b2 0d b3 f6 51 ba 77 77 77 ff ee de ce ee 7e ba ff 68 e7 de a3 fd 87 e9 cb 2f d0 f4 ff 01 a8 9f 68 15 63 00 00 00




--------------------------------------------------------------------------------
_______________________________________________
Perl-Win32-Users mailing list
Perl...@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

_______________________________________________
Perl-Win32-Users mailing list
Perl...@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Recent Messages in this Thread
Greg VisionInfosoft Feb 14, 2014 10:41 pm
sisy...@optusnet.com.au Feb 15, 2014 11:33 am
Richie Feb 15, 2014 01:46 pm
Messages in this thread