c - Flex, continuous scanning stream (from socket). Did I miss something using yywrap()? -
working on socketbased scanner (continuous stream) using flex pattern recognition. flex doesn't find match overlaps 'array bounderies'. implemented yywrap() setup new array content yylex() detects <> (it call yywrap). no success far.
basically (for pin-pointing problem) code:
%{ #include <stdio.h> #include <string.h> #include <stdlib.h> #define buffersize 26 /* 0123456789012345678901234 */ char cbuf1[buffersize] = "hello everybody, lex su"; // warning, no '\0' char cbuf2[buffersize] = "per cool. thanks! "; char recvbuffer[buffersize]; int packetcnt = 0; yy_buffer_state bufferstate1, bufferstate2; %} %option nounput %option noinput %% "super" { echo; } . { printf( "%c", yytext[0] );} %% int yywrap() { int retval = 1; printf(">> yywrap()\n"); if( packetcnt <= 0 ) // stop after 2 { // copy cbuf2 recvbuffer memcpy(recvbuffer, cbuf2, buffersize); // yyrestart(null); // ?? has no effect // feed new data flex bufferstate2 = yy_scan_bytes(recvbuffer, buffersize); // packetcnt++; // tell flex resume scanning retval = 0; } return(retval); } int main(void) { printf("lenght: %d\n", (int)sizeof(recvbuffer)) ; // copy cbuf1 recvbuffer memcpy(recvbuffer, cbuf1, buffersize); // packetcnt = 0; // bufferstate1 = yy_scan_bytes(recvbuffer, buffersize); // yylex(); yy_delete_buffer(bufferstate1); yy_delete_buffer(bufferstate2); return 0; }
this output:
dkmbpro:test dkroeske$ ./text lenght: 26 hello everybody, lex su>> yywrap() per cool. thanks! >> yywrap()
so no match on 'super'. according doc lexxer not 'reset' between yywrap's. miss? thanks.
the mechanism providing stream of input flex
provide definition of yy_input macro, called every time flex
needs refill buffer [note 1]. macro called 3 arguments, this:
yy_input(buffer, &bytes_read, max_bytes)
the macro expected read max_bytes
buffer
, , set bytes_read
actual number of bytes read. if there no more input in stream, yy_input
should set bytes_read
yy_null
(which 0). there no way flag input error other setting end of file condition. do not set yy_input
negative value.
note yy_input
not provide indication of read input or sort of userdata
argument. provided mechanism global yyin
, file*
. (you create file*
file/socket descriptor fdopen
, descriptor fileno
. other workarounds beyond scope of answer.)
when scanner encounters end of stream, indicated yy_input
returning 0, finishes current token [note 2], , calls yywrap
decide whether there stream process. manual indicates, not reset parser state (that is, start condition happens in; current line number if line counting enabled, etc.). however, it not allow tokens span 2 streams.
the yywrap
mechanism commonly used when parser/scanner applied number of different files specified on command line. in use case, bit odd if token start in 1 file , continue one; language implementations prefer files self-contained. (consider multi-line string literals, example.) normally, want reset more of parser state (the line number, certainly, , start condition), responsibility of yywrap
. [note 3]
for lexing socket, you'll want call recv
yy_input
implementation. experimentation purposes, here's simple yy_input
returns data memory buffer:
/* globals describe input buffer. */ const char* my_in_buffer = null; const char* my_in_pointer = null; const char* my_in_limit = null; void my_set_buffer(const char* buffer, size_t buflen) { my_in_buffer = my_in_pointer = buffer; my_in_limit = my_in_buffer + buflen; } /* debugging, limit number of bytes yy_input * return. */ #define my_maxread 26 /* technically incorrect because returns 0 * on eof, assuming yy_null 0. */ #define yy_input(buf, ret, maxlen) { \ size_t avail = my_in_limit - my_in_pointer; \ size_t toread = maxlen; \ if (toread > avail) toread = avail; \ if (toread > my_maxread) toread = my_maxread; \ *ret = toread; \ memcpy(buf, my_inpointer, toread); \ my_in_pointer += toread; \ } while (0)
notes
this not quite true; buffer state includes flag indicates whether buffer can refilled. if use
yy_scan_bytes
, buffer state created marked non-refillable.it's bit more complicated that, because flex scanners need ahead in order decide token has been matched, , end-of-stream indication might occur during lookahead. after scanner backs end of recognized token, still has rescan lookahead characters, may contain several more tokens. handle this, sets flag in buffer state indicates end-of-stream has been reached, prevents
yy_input
being called each time scanner hits end of buffer. despite this, it's idea make sureyy_input
implementation continue return end-of-stream in case called again after end-of-stream return.for concrete example, suppose wanted implement kind of
#include
mechanism.flex
providesyy_push_state/yy_pop_state
mechanism allows implement include stack. you'd callyy_push_state
onceinclude
directive has been scanned,yy_pop_state
needs calledyywrap
. again, few languages allow token start in included source file , continue followinginclude
directive.
Comments
Post a Comment