c - Flex, continuous scanning stream (from socket). Did I miss something using yywrap()? -


working on socketbased scanner (continuous stream) using flex pattern recognition. flex doesn't find match overlaps 'array bounderies'. implemented yywrap() setup new array content yylex() detects <> (it call yywrap). no success far.

basically (for pin-pointing problem) code:

%{  #include <stdio.h> #include <string.h> #include <stdlib.h>  #define buffersize 26                      /*   0123456789012345678901234 */ char cbuf1[buffersize] = "hello everybody, lex su";  // warning, no '\0' char cbuf2[buffersize] = "per cool. thanks!         "; char recvbuffer[buffersize];  int packetcnt = 0;  yy_buffer_state bufferstate1, bufferstate2;  %}  %option nounput %option noinput  %%  "super"                 { echo; } .                       { printf( "%c", yytext[0] );}  %%  int yywrap() {    int retval = 1;       printf(">> yywrap()\n");    if( packetcnt <= 0 )    // stop after 2   {     // copy cbuf2 recvbuffer     memcpy(recvbuffer, cbuf2, buffersize);      //     yyrestart(null); // ?? has no effect      // feed new data flex     bufferstate2 = yy_scan_bytes(recvbuffer, buffersize);       //     packetcnt++;      // tell flex resume scanning     retval = 0;      }    return(retval);  }  int main(void) {   printf("lenght: %d\n", (int)sizeof(recvbuffer)) ;    // copy cbuf1 recvbuffer   memcpy(recvbuffer, cbuf1, buffersize);    //   packetcnt = 0;    //   bufferstate1 = yy_scan_bytes(recvbuffer, buffersize);    //   yylex();    yy_delete_buffer(bufferstate1);   yy_delete_buffer(bufferstate2);    return 0; } 

this output:

dkmbpro:test dkroeske$ ./text  lenght: 26 hello everybody, lex su>> yywrap() per cool. thanks!         >> yywrap() 

so no match on 'super'. according doc lexxer not 'reset' between yywrap's. miss? thanks.

the mechanism providing stream of input flex provide definition of yy_input macro, called every time flex needs refill buffer [note 1]. macro called 3 arguments, this:

yy_input(buffer, &bytes_read, max_bytes) 

the macro expected read max_bytes buffer, , set bytes_read actual number of bytes read. if there no more input in stream, yy_input should set bytes_read yy_null (which 0). there no way flag input error other setting end of file condition. do not set yy_input negative value.

note yy_input not provide indication of read input or sort of userdata argument. provided mechanism global yyin, file*. (you create file* file/socket descriptor fdopen , descriptor fileno. other workarounds beyond scope of answer.)

when scanner encounters end of stream, indicated yy_input returning 0, finishes current token [note 2], , calls yywrap decide whether there stream process. manual indicates, not reset parser state (that is, start condition happens in; current line number if line counting enabled, etc.). however, it not allow tokens span 2 streams.

the yywrap mechanism commonly used when parser/scanner applied number of different files specified on command line. in use case, bit odd if token start in 1 file , continue one; language implementations prefer files self-contained. (consider multi-line string literals, example.) normally, want reset more of parser state (the line number, certainly, , start condition), responsibility of yywrap. [note 3]

for lexing socket, you'll want call recv yy_input implementation. experimentation purposes, here's simple yy_input returns data memory buffer:

/* globals describe input buffer. */ const char* my_in_buffer = null; const char* my_in_pointer = null; const char* my_in_limit = null; void my_set_buffer(const char* buffer, size_t buflen) {   my_in_buffer = my_in_pointer = buffer;   my_in_limit = my_in_buffer + buflen; }  /* debugging, limit number of bytes yy_input  * return.  */ #define my_maxread 26  /* technically incorrect because returns 0  * on eof, assuming yy_null 0.  */ #define yy_input(buf, ret, maxlen) {          \    size_t avail = my_in_limit - my_in_pointer;   \    size_t toread = maxlen;                       \    if (toread > avail) toread = avail;           \    if (toread > my_maxread) toread = my_maxread; \     *ret = toread;                                \    memcpy(buf, my_inpointer, toread);            \    my_in_pointer += toread;                      \ } while (0) 

notes

  1. this not quite true; buffer state includes flag indicates whether buffer can refilled. if use yy_scan_bytes, buffer state created marked non-refillable.

  2. it's bit more complicated that, because flex scanners need ahead in order decide token has been matched, , end-of-stream indication might occur during lookahead. after scanner backs end of recognized token, still has rescan lookahead characters, may contain several more tokens. handle this, sets flag in buffer state indicates end-of-stream has been reached, prevents yy_input being called each time scanner hits end of buffer. despite this, it's idea make sure yy_input implementation continue return end-of-stream in case called again after end-of-stream return.

  3. for concrete example, suppose wanted implement kind of #include mechanism. flex provides yy_push_state/yy_pop_state mechanism allows implement include stack. you'd call yy_push_state once include directive has been scanned, yy_pop_state needs called yywrap. again, few languages allow token start in included source file , continue following include directive.


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -