Articles
Software
Gallery
Other works
Contacts

#Scanning a string with Flex.html



        There's the stackoverflow solution,
         using a conviniently named and easy to discover function:
        { // Minimalistic lexer reading from a string with yy_scan_string()
            /* @BAKE
                flex -o scan_string.yy.c $@
                gcc  -o scan_string.out  scan_string.yy.c
                ./scan_string.out
               @STOP
             */
            %option noyywrap
            %%
            %%
            signed main() {
                const char * input_str = "This is my input";
                YY_BUFFER_STATE const b = yy_scan_string(input_str);
                yylex();
                yy_delete_buffer(b);
            }
        }
        I find this quite ugly to be honest. The reason it
         changes buffers is exactly that: it was designed with changing
         buffers in mind. However, often times, you would like your scanner
         to only work on (specific) strings, in which case the above code
         is almost deceptive about what is going on / what we are intending to do.
        Another way to accomplish the same effect would be do redefine YY_INPUT,
         like so:
        { // Minimalistic lexer reading from a string utalizing YY_INPUT
            /* @BAKE
                flex -o string.yy.c $@
                gcc  -o string.out  string.yy.c -lfl
                ./string.out
               @STOP
             */
            %{
                const char input_str[] = "This is my input";
                const int  len         = sizeof(input_str)-1;
                      int  offset      = len;
                #define YY_INPUT(buf, result, max_size) {                        \
                    int cpi = (offset && offset > max_size) ? max_size : offset; \
                    memcpy(buf, input_str+(len-offset), cpi);                    \
                    result = cpi;                                                \
                    offset = (cpi > offset) ? 0 : offset - cpi;                  \
                }
            %}
            %%
            %%
        }
        Now, is the change more than aesthetic? Well, theres one less heap allocation,
         but realistically thats not even worth mentioning. There could be a much
         stronger argument for performance if we could somehow dodge copying the input,
         similarly to yy_scan_buffer(). Speaking of which, reading its source code reviels
         that its magic is relatively trivial and in part assignes "b->yy_ch_buf" to the
         buffer argument. Which could allow one to hack it into reality, however
         section 11, in the description of yy_create_buffer() of the flex info page
         states the following:
            "The 'YY_BUFFER_STATE' type is a pointer to an opaque 'struct yy_buffer_state'\
              structure."
        Meaning, while yy_buffer_state is not opaque (from the perpective of the scanner),
         its documented as such, making trying to access its members a gamble.
        Those are all the relevant facts, choose your own poison now.