c - Tokenize a line without strtok -
i'm reading lines file , tokenizing them. tokens distinguished being separated space(s) or if inside quotes(example: "to ken"
).
i wrote code, have problem pointers. don't know how store tokens line or rather set pointers them.
also suggested put 0 behind every token "recognize" i'll know when ends , store in char *tokens[]
pointers point start of tokens.
my current code:
char *tokens[50]; int token_count; int tokenize(char *line){ token_count = 0; int n = 0; while(line[n] != null || line[n] != '\n'){ while(isspace(line[n++])); if(line[n] == '"'){ while(line[++n] != '"' || line[n] != null){ /* set tokens[n] */ } } else{ while(!isspace(line[n++])){ /*set tokens[n] */ } } n++; } tokens[token_count] = 0; }
you use string base line
, index n
step through string incrementing n
:
while (str[n] != '\0') n++;
your task might easier if used pointers:
while (*str != '\0') str++;
your tokens can expressed value of pointer before reading token, i.e. when hit quotation mark or non-space. gives start of token.
what length of token? in c, strings arrays of chars, terminated null char. means, tokens contain rest of whole line , therefore subsequent tokens. place '\0'
after each token, has 2 drawbacks: doesn't work on read-only string literals and, depending on token syntax, not possible. example, string a"b b"c
should parse 3 tokens a
, "b b"
, c
, placing null chars after tokens break tokenising process.
an alternative store tokens pairs of pointer starting char , length. these tokens no longer null-terminated, have write them temporary buffer if want use them standard c string functions.
here's way that.
#include <stdlib.h> #include <stdio.h> #include <ctype.h> struct token { const char *str; int length; }; int tokenize(const char *p, struct token tk[], int n) { const char *start; int count = 0; while (*p) { while (isspace(*p)) p++; if (*p == '\0') break; start = p; if (*p == '"') { p++; while (*p && *p != '"') p++; if (*p == '\0') return -1; /* quote not closed */ p++; } else { while (*p && !isspace(*p) && *p != '"') p++; } if (count < n) { tk[count].str = start; tk[count].length = p - start; } count++; } return count; } void token_print(const struct token tk[], int n) { int i; (i = 0; < n; i++) { printf("[%d] '%.*s'\n", i, tk[i].length, tk[i].str); } } #define max_token 10 int main() { const char *line = "the \"new york\" stock exchange"; struct token tk[max_token]; int n; n = tokenize(line, tk, max_token); if (n > max_token) n = max_token; token_print(tk, n); return 0; }
the start of each token saved in local variable , assigned token after has been scanned. when p
points character after token, expression:
p - start
gives length. (this called pointer arithmetic.) routine scans tokens, assigns @ n
tokens not overflow provided buffer.
Comments
Post a Comment