c - Tokenize a line without strtok -


i'm reading lines file , tokenizing them. tokens distinguished being separated space(s) or if inside quotes(example: "to ken").

i wrote code, have problem pointers. don't know how store tokens line or rather set pointers them.

also suggested put 0 behind every token "recognize" i'll know when ends , store in char *tokens[] pointers point start of tokens.

my current code:

char *tokens[50]; int token_count;  int tokenize(char *line){     token_count = 0;     int n = 0;                while(line[n] != null || line[n] != '\n'){         while(isspace(line[n++]));         if(line[n] == '"'){             while(line[++n] != '"' || line[n] != null){                   /* set tokens[n] */             }         }         else{             while(!isspace(line[n++])){                   /*set tokens[n] */             }          }          n++;     }      tokens[token_count] = 0;  } 

you use string base line , index n step through string incrementing n:

while (str[n] != '\0') n++; 

your task might easier if used pointers:

while (*str != '\0') str++; 

your tokens can expressed value of pointer before reading token, i.e. when hit quotation mark or non-space. gives start of token.

what length of token? in c, strings arrays of chars, terminated null char. means, tokens contain rest of whole line , therefore subsequent tokens. place '\0' after each token, has 2 drawbacks: doesn't work on read-only string literals and, depending on token syntax, not possible. example, string a"b b"c should parse 3 tokens a, "b b" , c, placing null chars after tokens break tokenising process.

an alternative store tokens pairs of pointer starting char , length. these tokens no longer null-terminated, have write them temporary buffer if want use them standard c string functions.

here's way that.

#include <stdlib.h> #include <stdio.h> #include <ctype.h>  struct token {     const char *str;     int length; };  int tokenize(const char *p, struct token tk[], int n) {     const char *start;     int count = 0;         while (*p) {         while (isspace(*p)) p++;         if (*p == '\0') break;          start = p;         if (*p == '"') {             p++;             while (*p && *p != '"') p++;             if (*p == '\0') return -1;        /* quote not closed */                         p++;         } else {                         while (*p && !isspace(*p) && *p != '"') p++;         }          if (count < n) {             tk[count].str = start;             tk[count].length = p - start;         }         count++;     }      return count; }  void token_print(const struct token tk[], int n) {     int i;      (i = 0; < n; i++) {         printf("[%d] '%.*s'\n", i, tk[i].length, tk[i].str);     } }  #define max_token 10  int main() {     const char *line = "the \"new york\" stock exchange";     struct token tk[max_token];     int n;      n = tokenize(line, tk, max_token);     if (n > max_token) n = max_token;     token_print(tk, n);          return 0; } 

the start of each token saved in local variable , assigned token after has been scanned. when p points character after token, expression:

p - start 

gives length. (this called pointer arithmetic.) routine scans tokens, assigns @ n tokens not overflow provided buffer.


Comments

Popular posts from this blog

commonjs - How to write a typescript definition file for a node module that exports a function? -

openid - Okta: Failed to get authorization code through API call -

thorough guide for profiling racket code -