First off, I visit this site quite regularly, but I've never posted. I find lots of help here and thought I'd return the favor. I recently worked on a project that required I use the strtok function. A problem I was running into was that the strtok was changing my original variable. I finally was able to fix my problem to get a successful copy to tokenize without changing the original.
A little background: This is set up as a function of a child class. The char variable is obviously declared elsewhere, but I showed it for the sake of clarity. Also, strcpy_s was used because I have VS, however strcpy works also (parameters would be different). Hopefully with the comments, the rest of the code is clear enough to be understood easily:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
char decimalNumber[] = "12.34";
int ChildClass::getNumberBeforeDecimal()
{
char numBeforeDecimal[6]="";
char* token;
strcpy_s(numBeforeDecimal, //copying decimalNumber
sizeof(numBeforeDecimal),decimalNumber); //to numBeforeDecimal
strtok(numBeforeDecimal, ".");
token = strtok(NULL, "."); //should assign "12" to token
return atoi(token); //converts the token and returns 12
}
Now, I haven't tested this exact code (feel free to correct it if I made a mistake). I took the code that I originally had written (yes, it worked!) and tried to make it generic enough to be understood without screwing it up. Oh, and I know there's an easier way of getting numbers before a decimal. This is just for the purpose of helping understand one use of strtok. With little effort this could be used to return the numbers after the decimal.
Feel free to post other uses of strtok if you want.
I think that is exactly the part that trips most people up, and the reason for the OP's post. The documentation really should have an x-large, bold text in strobing read and orange that says:
strtok()changes your string!
This can be a particular problem if you are messing with const or const-reference data, and just tell the C++ compiler to shut-up about the argument type warning.
I think you would be better refactoring the code to use strtok_r instead of strtok as it is inherently thread-unsafe (strtok that is). For an example of why strtok_r would be better, consider this SSCCE:
(I changed use of strcpy_s to the more widely implemented strlcpy)
This is because the inner nested strtok call of your method overwrote the outer strtok cursor (strtok uses a static char * to keep a ref to the cursor).
It's much clearer to use regular expressions to stuff like this. Esp. in a larger context, like the real one you're using it, not just this simplified example.
You can get an excellent regular expression library from boost.org
#include <boost/regex>
using boost_regex;
const regex token("\\G(\\d+\\.\\d+)(:|$)");
regex_match m;
while ( regex_match(s, m, token) ) {
and so forth... There a few more details.
Admittedly, sometimes regular expressions are overkill.
It is my programming philosophy that if the input comes a source that is not 100% reliable (such as the user), then you should not make any assumptions about its structure or correctness. It's always a judgment call about how loosely you may interpret the input, but, for example, if someone enters a telephone number as "617-55512-12", don't just strip out the dashes, assume that he made an error and that there is a very good chance that the number itself is not what the user intended. How anal you get about this depends in large part on the consequences of bad input, in other words, are you writing a flight simulator or controlling the flight of real planes with real humans in them.