Super String
Last update: 3/29/99
Synopsis: A string class that includes many operations, buffer functionality, numeric conversions, complex searching, and of course char* compatibility.
Contents
Overview what's it all about Basic Operations typical string class construction and access String Operations substrings, cut and paste, and such Buffer Operations it's only memory Searching and Matching finding things Find Predicates concerning the type SS::Find Predefined Find Objects whitespace, uppercase, etc. Numerical Operations formatting to/from numbers Supported Types types that can be converted and compared Bounds Checking and Adjusting when it does one of these Exceptions who throws what and when Typedefs a few type aliases Constants some constants Supporting Classes call them intermediate data types STL Compatibility mostly iterators C Compatibility the lowest common denominator
This class works pretty much like any other string class: feed it string literals, index the characters, let it do the char* memory management. Beyond that there's basic stuff, more methods, buffer methods, searching, and finally numbers. These methods deal with accessing or manipulating ranges of characters. Such as: They operate on a range of characters. There are three kinds of ranges: firstChar and lastChar can throw an exception, see Exceptions. A buffer is either: What constitutes a buffer operation is a little loose: A region (also reference) has a representation that points to an internal part of its parent string. The advantages are that no copying need be done to refer to a part of a string and that the parent may be manipulated via this part of itself, perhaps with simplified logic. A region is a buffer in that there is not an extra null-termination byte. Also a copy of a reference is a reference not a separate string. It is ill-behaved in that it is invalidated when the parent goes out of scope, is deleted, or assigned a different value. Value semantics are violated here. Note: to create a buffer of, say, size 10 use When Super String allocates the memory for its representation it always allocates an additional byte for zero termination. Since a non zero terminated Super String can be easily constructed with these methods generalized code to handle Super Strings should not assume zero termination. A view of the methods, from a distance: A find always has a starting point and a direction (Next or Prev). Some have additional parameters to restrict the search or return values. Matching can take place at either a single location in a string or over a range of positions. Most matching is directionless; all matches are found and returned. Many of the parameters take default values. A forward search will start by default at the beginning of the string; a backward the end. Ranges will default to the whole string or the rest of the string. See Constants for more. For an explanation of ranges see String Operations. Predicate objects should be stored and used as objects of SS::Find or as references or pointers to SS::Find. This requirement is due to the mechanism by which predicate objects are copied. An SS::Find object is either the base class portion of a complete object derived from SS::Finder or else it is a complete object that has been sliced to SS::Find. The act of slicing causes the complete object to be cloned and saved as a pointer to SS::Finder in the new SS::Find object. Additional predicates are derived off SS::Finder. If the predicate contains data that cannot be copied as values it should supply a copy constructor, destructor, and a SS::Finder* clone() const; member. It should also supply a clone if it implements the found virtual function. It is intended that the predicate objects be stateless, i.e. the result of a match not depend upon the result of a previous match, but that's only due to the potential confusion and unpredictability. Finally, a predicate that can be constructed with a single parameter of type SS::Find const & should have a protected or private copy constructor. Convert a Super String to a number or explicitly convert a number to a Super String. Implicit conversion is via constructor, in base 10. See Supported Types. toX explicitly specifies a type; toInteger has the number supply it. An example of the latter is template use. Note that the string itself knows nothing about base: two Super Strings created from the same number but different base will not compare as equal. The above throw SS::ErrorNumberFormat, see Exceptions
A type is considered fully supported if an object of that type can convert to and from a Super String, be assigned to or concatenated with a Super String, and compared to a Super String using any relational operator. This section describes the mechanism to accomplish these operations. Here are the types currently fully supported: Arbitrary types can be added by means of non-member functions that are called by the appropriate template member function. For example, to allow an object of type X to convert to a Super String supply a SSconvert (X, SS&), rather than X::operator SS(). These function implement Super String support for a given type. Additional supported types can be added by the user. It is not required that all these functions be supplied. SSconvert is typically the most useful. Note that a problematic template <class T> SS::operator T() could be defined. Binary operations with Super String are implemented as member template functions for Super String on the left; as non-member template functions for Super String on the right. Concatenation is performed by converting the operand to Super String and concatenating the strings. Thus, the corresponding SSconvert must be defined. Relational operators require the definition of the appropriate SScompare. This additional function allows more flexibility and potentially greater performance than converting the operand to string and then comparing as strings. Comparisons are assumed to commute. An out of range single character access is considered definitely to be an error. Some of Super String's methods allow negative lengths and/or starting points that lie outside the string. This is to facilitate usage in algorithms that generate endpoints that are out of range or arbitrarily ordered. Thus it is not necessary to manually truncate and order the range before use. The tradeoff for this feature is reduced error detection. The range will be normalized into a valid range. A range completely outside of the string is a null operation. Methods that operate on a range include get, set, cut, removeRange, replaceRange, and operator(). A out of range position for paste will result in a prepend or an append. An out of range multiple character access is considered possibly correct. Some member functions throw exceptions. The root of the exception hierarchy is SS::Error, so catching it will also catch the rest. A char const * what() const method will give a descriptive error message. The exception can be caught as a reference or a value, i.e. See the STL documentation for information on std::vector<>. See Supporting Classes for Pair<>. Integral constants are currently implemented as enums. The nested class SS::Buffer allows the functionality of SS::buffer(int n) and SS::buffer(void const * v, int n) (see Buffer Operations) to be used in a constructor. The latter is needed because SS(void const * v, int n) (see Basic Operations) has copy semantics; the former because SS(int) (see Supported Types) means to convert as a decimal integer. Code that treats characters as ints also discourage such a constructor. Super String's buffer constructor SS(SS::Buffer const &) has two flavors: produce a string with a specific length or specify the region of memory for the string. This second flavor produces a string that has not allocated its memory, will not delete it, and with an internal char* representation that is not necessarily zero terminated. This last fact means that Super String's implementation or code that accesses a Super String through its char* representation cannot, in general, assume zero termination. The SS::Buffer object is passed to SS(SS::Buffer const &) and is processed as follows: A substring is either a Super String or else an object that can be assigned to. This assignment is then performed by replacing the characters of the original string, in the given range, with the right hand side of the assignment. Alternately, the substring can be used as a Super String in which case a copy of the range in the original string is generated. SS::SubSS is not a user accessible type. See also String Operations. The class Pair is used by Super String as one way to represent a range (see String Operations). It is an instance of a template class. Super String supports STL random-access iterators. See the STL documentation. See also Supported Types. Overview
Basic Operations
This is standard fare for C++ string classes. The functionality of length, size, empty, operator [], at, c_str, data, and erase is compatible with std::string. The operator [] and at can throw an exception, see Exceptions.
SS (); construct a Super String with a value of "" template <class T> SS (T const & t); construct a Super String for any supported type SS (SS const & s); copy constructor for Super String SS (void const * v, int n); construct a Super String by copying n bytes starting at v int length () const;
int size () const;the number of characters in the string bool empty () const; returns true if the string is empty; false otherwise
an empty string will have a length of zero and a value of ""char & operator [] (int ind);
char const & operator [] (int ind) const;return a reference to the character at position ind char & at (int ind);
char const & at (int ind) const;same as operator [] (int) operator char * ();
operator char const * () const;returns a pointer to the first char of the string
see C Compatibility for morechar * c_str ();
char const * c_str () const;
char * data ();
char const * data () const;same as operator char * () SS clone () const; return a copy of the string SS& erase (); erase the string, i.e. set it to ""
return a reference to the erased string
String Operations
Some methods can take any of these kinds of ranges as input. Methods that return ranges do so as either Pair<int> or std::vector <Pair<int>> beglen.
int beg, int len where beg is the starting position, len the number of characters. Pair<int> beglen with beglen._0 the start and beglen._1 the length. See Supporting Classes std::vector <Pair<int>> beglen multiple ranges can be operated on.
substring
SubSS sub (int beg, int len);
SS sub (int beg, int len) const;
SubSS operator () (int beg, int len);
SS operator () (int beg, int len) const;
SubSS operator () (Pair<int> const & beglen);
SS operator () (Pair<int> const & beglen) const;return a substring representing a range of characters
a substring of a non-constant string can be assigned to
see Supporting Classes for more on SS::SubSSget
SS get (int beg, int len=1) const;
SS getFrom (int beg) const;
SS get (Pair<int> const & beglen) const;
SS get (std::vector< Pair<int> > const & beglen) const;
std::vector<SS>& get (std::vector<SS>& s, std::vector< Pair<int> > const & beglen) const;copy a range of characters into a new string
getFrom gets all the way to the end of the string
get(std::vector< Pair<int> > const & beglen) concatenates the ranges
get(std::vector<SS>& s, std::vector< Pair<int> > const & beglen),
stores each range as an element of s and returns sset
SS& set (SS const & s, int pos=0);
SS& set (SS const & s, int pos, int beg, int len);
SS& set (SS const & s, int pos, Pair<int> const & beglen);
SS& set (SS const & s, std::vector<int> const & pos);
SS& set (std::vector<SS> const & s, std::vector<int> const & pos);assign new values to a range of characters starting at pos
beg, len, and beglen refer to the argument string s
set does not extend the string
the updated string is returned
the last two set s or an element of s at the elements of poscut
SS cut (int beg, int len=1);
SS cutFrom (int beg);
SS cut (Pair<int> const & beglen);
SS& cut (SS& s, int beg, int len=1);
SS& cut (SS& s, Pair<int> const & beglen);
std::vector<SS>& cut (std::vector<SS>& s, std::vector< Pair<int> > const & beglen);delete a range of characters and return them
beg and len or beglen specify the range or ranges to be cut
cutFrom cuts all the way to the end of the string
either a new string or s is returned
the last makes multiple cuts at oncepaste
SS& paste (SS const & s, int pos=0);
SS& paste (SS const & s, int pos, int beg, int len);
SS& paste (SS const & s, int pos, Pair<int> const & beglen);
SS& paste (SS const & s, std::vector<int> const & pos);
SS& paste (std::vector<SS> const & s, std::vector<int> const & pos);insert s before the position pos
beg, len, and beglen refer to the argument string s
a value for pos of the string's length or SS::fullength results in an append
the updated string is returned
the last two paste s or an element of s at the elements of posremoveRange
SS& removeRange (int beg=0, int len=fullength);
SS& removeRange (Pair<int> const & beglen);
SS& removeRange (std::vector< Pair<int> > const & beglen);delete the specified range or ranges of characters
return the updated stringreplaceRange
SS& replaceRange (SS const & newseq, int beg=0, int len=fullength);
SS& replaceRange (SS const & newseq, Pair<int> const & beglen);
SS& replaceRange (SS const & newseq, std::vector< Pair<int> > const & beglen);
SS& replaceRange (std::vector<SS> const & s, std::vector< Pair<int> > const & beglen);replace a range of characters with newseq
beg and len or beglen specify the range or ranges to be replaced
the updated string is returned
the last replaces the corresponding range from beglen with an element of sreverse
SS& reverse (int beg=0, int len=fullength);
SS& reverse (Pair<int> const & beglen);
SS& reverse (std::vector< Pair<int> > const & beglen);
SS& itemReverse (std::vector< Pair<int> > const & beglen);
SS& tailReverse (int len);reverse the order of characters within a range or ranges
with multiple ranges each range is considered distinct
itemReverse reverses the ranges considered as distinct strings,
rather than as individual characters
tailReverse reverses the last len characters
return the updated stringsort
SS& sort (int beg=0, int len=fullength);
SS& sort (Pair<int> const & beglen);
SS& sort (std::vector< Pair<int> > const & beglen);
SS& itemSort (std::vector< Pair<int> > const & beglen);
SS& tailSort (int len);sort the characters within a range or ranges
with multiple ranges each range is considered distinct
itemSort sorts the ranges considered as distinct strings,
rather than as individual characters
tailSort sorts the last len characters
return the updated stringfill
SS& fill (char c, int beg=0, int len=fullength);
SS& fill (char c, Pair<int> const & beglen);
SS& fill (char c, std::vector< Pair<int> > const & beglen);set each character in the range or ranges to c
return the updated stringrepeat
SS& repeat (SS const & s, int beg=0, int len=fullength);
SS& repeat (SS const & s, Pair<int> const & beglen);
SS& repeat (SS const & s, std::vector< Pair<int> > const & beglen);like fill except with a string argument
does not change the length of the stringSS head (int len) const; return up to the first len characters of the string SS tail (int len) const; return up to the last len characters of the string char & firstChar ();
char const & firstChar () const;return a reference to the first character of the string char & lastChar ();
char const & lastChar () const;return a reference to the last character of the string bool isUpperCase (int pos) const;
bool isLowerCase (int pos) const;
bool isWhiteSpace (int pos) const;
bool isBlackSpace (int pos) const;
bool isAlpha (int pos) const;
bool isDigit (int pos) const;
bool isAlphaNumeric (int pos) const;
bool isPunct (int pos) const;
bool isPrintable (int pos) const;
bool isHexDigit (int pos) const;
bool isCntrl (int pos) const;
bool isGraph (int pos) const;determine a classification of the character at pos
these are as per clib isspace, isupper, etc.SS& toLower ();
SS& toLower (int pos);convert the character at pos or the entire string to lowercase
return the updated stringSS& toUpper ();
SS& toUpper (int pos);convert the character at pos or the entire string to uppercase
return the updated stringint compare (X x) const; compare string to an object x
see Supported Typesint compareNoCase (SS const & s) const; do a case insensitive compare int compare (void const * v, int n) const; compare string to the buffer v with a length of n characters bool compare (Find const & f) const; determine if the entire string is matched by f
see Find PredicatesSS trim () const; remove leading and trailing whitespace SS dup (int n=1) const; create a string that duplicates the current string n times
Buffer Operations
. The expression SS s (SS::Buffer(10))
will produce the string "10". SS s(10)
SS (Buffer const & b); construct a Super String via an SS::Buffer object
see Supporting Classes
currently implemented by template, see Supported TypesSS::Buffer (int n=0, SS const & fillvalue=""); direct SS (Buffer const & b) to create a string of length n,
possibly filled with fillvalueSS::Buffer (void const * start, int n); direct SS (Buffer const & b) to assign start and n to the string's internal representation
see SS::buffer (void const * s, int n)SS& buffer (int n); change string to hold n characters
current contents are destroyed; buffer is zero terminated
returns the altered stringSS& buffer (void const * v, int n); assign v and n to the string's internal representation
no memory is allocated or copied
the string will not try to free s when it goes out of scope
the string is not guarantied to be zero terminated
returns the altered stringSS& resize (int n); shrink or expand the string
current contents are preserved
returns the altered stringSS& resizeToNullTerminator (); resize the string, keeping the contents up to the first embedded null SS getRegion (int beg=0, int len=fullength) const;
SS getRegion (Pair<int> const & beglen) const;
std::vector<SS>& getRegion (std::vector<SS>& s, std::vector< Pair<int> > const & beglen) const;create a string that is a reference, or region, to a part of its parent
beg and len or beglen is the range to which the region refers
the last creates a region for each range in beglenchar* extract (); take over the memory management for the internal representation
the internal rep will not be deleted when the string goes out of scope
returns the char* internal repSS& zero ();
SS& zero (int beg, int len);fill the string or a range with the null character, '\000' template <class T> static inline void zero (T& t); zero out an arbitrary object
use SS::zero() or SS::zero(int,int) for a Super String
the class T should not have any virtual functionstemplate <class T> static inline void fillObject (T& t, char c); fill an arbitrary object with c
use SS::fill(char,..) for a Super String
the class T should not have any virtual functionstemplate <class T> static inline SS fromObject (T& t); create a string by copying the bytes from t SS const & copyTo (void* dst, int n=fullength, int beg=0) const; copy bytes from the string to dst
n is the number of bytes; beg is the starting position in the string
returns the unaltered stringSS & copyFrom (void* src, int n=fullength, int beg=0); copy bytes from src to the string
up to n bytes will be copied to the string starting at beg
the string will not be extended
returns the altered stringSS & assignFrom (void* src, int n); copy n bytes to the string, starting from src
current contents of the string are lost
returns the altered stringtemplate <class T> SS const & copyToObject (T& t, int beg=0) const; copy bytes from the string to an arbitrary object
beg is the starting position in the string
returns the unaltered stringtemplate <class T> SS & copyFromObject (T& t, int beg=0); copy bytes from an arbitrary object into the string
beg is the starting position in the string
the string will not be extended
returns the altered stringtemplate <class T> SS & assignFromObject (T& t); copy the bytes from an arbitrary object into the string
current contents of the string are lost
returns the altered stringSS& become (SS& s); take over the contents of s
returns the altered stringSS& swap (SS& s); swap contents with s
returns the altered stringSS& swap (int pos0, int pos1); swap the values at pos0 and pos1
returns the altered string
Searching and Matching
These are methods that find sequences of characters in a string that either match another string or satisfy some criterion. You can search for the following:
findNext
int findNext (SS const & sequence, int beg=0, SS& result=nullref) const;
int findPrev (SS const & sequence, int beg=maxindex, SS& result=nullref) const;
int findNext (Find const & finder, int beg=0, SS& result=nullref) const;
int findPrev (Find const & finder, int beg=maxindex, SS& result=nullref) const;
int findNextNoCase (SS const & sequence, int beg=0, SS& result=nullref) const;
int findPrevNoCase (SS const & sequence, int beg=maxindex, SS& result=nullref) const;
int findNextOf (SS const & charset, int beg=0, SS& result=nullref) const;
int findPrevOf (SS const & charset, int beg=maxindex, SS& result=nullref) const;starting at beg, try each position util a match is found and return that position
if no match is found return SS::notfound
the matching characters are returned in result, if one is supplied
the Next methods search forward, the Prev backward
the NoCase versions do a case insensitive compare
findNextOf and findPrevOf search for any character in charsetfindNextMatch
Pair<int> findNextMatch (SS const & sequence, int beg=0) const;
Pair<int> findPrevMatch (SS const & sequence, int beg=maxindex) const;
Pair<int> findNextMatch (Find const & finder, int beg=0) const;
Pair<int> findPrevMatch (Find const & finder, int beg=maxindex) const;
Pair<int> findNextNoCaseMatch (SS const & sequence, int beg=0) const;
Pair<int> findPrevNoCaseMatch (SS const & sequence, int beg=maxindex) const;searching is performed in the same manner as described in findNext
the return value is the range in the string of the match
if no match is found SS::nomatch is returned.findNextString
SS findNextString (SS const & sequence, int beg=0 ) const;
SS findPrevString (SS const & sequence, int beg=maxindex) const;
SS findNextString (Find const & finder, int beg=0 ) const;
SS findPrevString (Find const & finder, int beg=maxindex) const;
SubSS findNextSubString (SS const & sequence, int beg=0 );
SubSS findPrevSubString (SS const & sequence, int beg=maxindex);
SubSS findNextSubString (Find const & finder, int beg=0 );
SubSS findPrevSubString (Find const & finder, int beg=maxindex);searching is performed in the same manner as described in findNext
the return value is the string that was found
if no match was found throw SS::ErrorNotFound
the String and SubString versions return Super String and SS::SubSS, respectivelyfind
int find (Find const & finder, int beg, int* len=0, int end=fullength, int inc=1) const;
int rfind (Find const & finder, int beg, int* len=0, int end=fullength) const;
int find (SS const & sequence, int beg, int* len=0, int end=fullength, int inc=1) const;
int rfind (SS const & sequence, int beg, int* len=0, int end=fullength) const;beg is the first position checked, end is the last
inc is the stepsize of the search, with rfind using -1
returns SS::notfound if no match is found at the positions tried
the match length is returned in len, if supplied
a value of SS::fullength for end refers to the beginning or end of the string depending upon directioncontains
bool contains (SS const & sequence, int beg=0, int len=fullength) const;
bool contains (Find const & finder, int beg=0, int len=fullength) const;
bool contains (SS const & sequence, Pair<int> const & beglen) const;
bool contains (Find const & finder, Pair<int> const & beglen) const;
bool contains (SS const & sequence, std::vector< Pair<int> > const & beglen) const;
bool contains (Find const & finder, std::vector< Pair<int> > const & beglen) const;returns true if the sequence or finder is found anywhere in the range or ranges; false otherwise population
int population (SS const & sequence, int beg=0, int len=fullength) const;
int population (Find const & finder, int beg=0, int len=fullength) const;
int population (SS const & sequence, Pair<int> const & beglen) const;
int population (Find const & finder, Pair<int> const & beglen) const;
int population (SS const & sequence, std::vector< Pair<int> > const & beglen) const;
int population (Find const & finder, std::vector< Pair<int> > const & beglen) const;counts the number of occurrences of sequence or finder in the range or ranges
returns the total foundremove
SS& remove (SS const & oldseq, int beg=0, int len=fullength);
SS& remove (Find const & finder, int beg=0, int len=fullength);
SS& remove (SS const & oldseq, Pair<int> const & beglen);
SS& remove (Find const & finder, Pair<int> const & beglen);
SS& remove (SS const & oldseq, std::vector< Pair<int> > const & beglen);
SS& remove (Find const & finder, std::vector< Pair<int> > const & beglen);deletes every occurrence of sequence or finder in the range or ranges
returns the altered stringreplace
SS& replace (SS const & oldseq, SS const & newseq, int beg=0, int len=fullength);
SS& replace (Find const & finder, SS const & newseq, int beg=0, int len=fullength);
SS& replace (SS const & oldseq, SS const & newseq, Pair<int> const & beglen);
SS& replace (Find const & finder, SS const & newseq, Pair<int> const & beglen);
SS& replace (SS const & oldseq, SS const & newseq, std::vector< Pair<int> > const & beglen);
SS& replace (Find const & finder, SS const & newseq, std::vector< Pair<int> > const & beglen);replaces every occurrence of sequence or finder in the range or ranges with newseq
returns the altered stringremoveForward
int removeForward (SS const & oldseq, int count=1, int beg=0, int len=fullength);
int removeForward (Find const & finder, int count=1, int beg=0, int len=fullength);
int removeBackward (SS const & oldseq, int count=1, int beg=0, int len=fullength);
int removeBackward (Find const & finder, int count=1, int beg=0, int len=fullength);directional remove, up to count items are deleted in the range
returns the number of items actually deleted
the Forward version starts at the beginning of the range; the Backward the end
count can be SS::allitemsreplaceForward
int replaceForward (SS const & oldseq, SS const newseq, int count=1, int beg=0, int len=fullength);
int replaceForward (Find const & finder, SS const newseq, int count=1, int beg=0, int len=fullength);
int replaceBackward (SS const & oldseq, SS const newseq, int count=1, int beg=0, int len=fullength);
int replaceBackward (Find const & finder, SS const newseq, int count=1, int beg=0, int len=fullength);just like removeForward except replace matches with newseq match (at a single position)
bool match (SS const & sequence, int pos, SS& result=nullref) const;
bool match (Find const & finder, int pos, SS& result=nullref) const;if sequence or finder match at position pos return true; false otherwise
if there is a match return the matched characters in result, if suppliedmatch (over a range)
std::vector< Pair<int> >& match (SS const & sequence, std::vector< Pair<int> >& beglen, int beg=0, int len=fullength) const;
std::vector< Pair<int> >& match (Find const & finder, std::vector< Pair<int> >& beglen, int beg=0, int len=fullength) const;
std::vector< Pair<int> >& match (SS const & sequence, std::vector< Pair<int> >& beglen, Pair<int> const & beglen_src) const;
std::vector< Pair<int> >& match (Find const & finder, std::vector< Pair<int> >& beglen, Pair<int> const & beglen_src) const;
std::vector< Pair<int> >& match (SS const & sequence, std::vector< Pair<int> >& beglen, std::vector< Pair<int> > const & beglen_src) const;
std::vector< Pair<int> >& match (Find const & finder, std::vector< Pair<int> >& beglen, std::vector< Pair<int> > const & beglen_src) const;find all the matches in the range or ranges and store them in beglen
the return value is also beglen
if no matches are found beglen will have a size of zeromatchForward
std::vector< Pair<int> >& matchForward (SS const & sequence, std::vector< Pair<int> >& beglen, int count=1, int beg=0, int len=fullength) const;
std::vector< Pair<int> >& matchForward (Find const & finder, std::vector< Pair<int> >& beglen, int count=1, int beg=0, int len=fullength) const;
std::vector< Pair<int> >& matchBackward (SS const & sequence, std::vector< Pair<int> >& beglen, int count=1, int beg=0, int len=fullength) const;
std::vector< Pair<int> >& matchBackward (Find const & finder, std::vector< Pair<int> >& beglen, int count=1, int beg=0, int len=fullength) const;directional match, up to count items are found in the range
returns the number of items actually found
beglen is as in match
the Forward version starts at the beginning of the range; the Backward the end
count can be SS::allitemstokenize
std::vector< Pair<int> >& tokenize (std::vector< Pair<int> >& beglen) const;
std::vector< Pair<int> >& tokenize (SS const & delimeter, std::vector< Pair<int> >& beglen) const;
std::vector< Pair<int> >& tokenize (Find const & finder, std::vector< Pair<int> >& beglen) const;break string into tokens and store starting positions and lengths in beglen
use whitespace, a string, or a SS::Find object to delimitate tokens
returns beglen
Find Predicates
How to use it
These predicates are for use with the various searching and matching functions in Super String. (see Searching and Matching) The predicate objects are of base type of SS::Find may be created and stored or simply created as temporaries when the desired Super String member function is called. They are nested classes of Super String. Predicates that take arguments of type SS::Find const & allow composition of arbitrarily complex predicates. New predicate classes may be subclassed off of SS::Finder. They needn't be nested classes of Super String and no privileged access is required. There are some predefined find objects to perform a few simple searches. How it works
A member function using a find predicate will typically examine a range of positions in the string. At each one the predicate's found method will be invoked:
where s is usually *this and pos is the position for which the match is being checked. If the predicate reports a match it will return the number of characters matched in len. virtual bool found (SS const & s, int pos, int& len) const;
Find base class of Super String's find hierarchy Finder base class for deriving additional Find classes FindChar (char c); matches if the character at the search position is c and returns the character FindCharNoCase (char c); like FindChar but case insensitive FindFunc (Func f); calls f with the character at the search position and, if true, returns that character
Func is of type bool(*)(char c)FindSet (SS const & charset); matches any character in charset and returns a match FindString (SS const & str); matches str at the search position and returns it FindVector (int n=0); a base class to manage a vector of find predicates FindSequence (F f0, F f1);
FindSequence (F f0, F f1, F f2);
FindSequence (F f0, F f1, F f2, F f3);
FindSequence (F f0, F f1, F f2, F f3, ...);the arguments are all of type SS::Find const &
each match is performed at the character following the previous, and if all match the union is returned
additional components can be added with void add (F f);FindStringNoCase (SS const & s); does a case insensitive match of s at the search position FindProxy (Find const & f); base class for find predicates that take a SS::Find const & as an argument FindNot (Find const & f); matches if f doesn't match; doesn't match if f matches
the length of the match is determined by f and may not be meaningfulFindOr (F f0, F f1);
FindOr (F f0, F f1, F f2);
FindOr (F f0, F f1, F f2, F f3);
FindOr (F f0, F f1, F f2, F f3, ...);the arguments are all of type SS::Find const &
all the components are tried and the longest match, if any, is returned
additional components can be added with void add (F f);FindAnd (F f0, F f1);
FindAnd (F f0, F f1, F f2);
FindAnd (F f0, F f1, F f2, F f3);
FindAnd (F f0, F f1, F f2, F f3, ...);the arguments are all of type SS::Find const &
all the components are tried and if all match, the longest one is returned
additional components can be added with void add (F f);FindCharCompare (char w, Comp comp); compares the character at the search position to w using the comparison function comp
Comp is of type bool(*)(char u, char w);
with the first parameter from the string, the second from the supplied characterFindCharGreaterThan (char w); like FindCharCompare, requiring the character in the string to be greater than w FindCharGreaterThanEqual (char w); like FindCharCompare, requiring the character in the string to be greater than or equal to w FindCharLessThan (char w); like FindCharCompare, requiring the character in the string to be less than w FindCharLessThanEqual (char w); like FindCharCompare, requiring the character in the string to be less than or equal to w FindCharInRange (char w_low, char w_high); find a character between w_low and w_high, inclusive FindCompare (Find const & f, SS const & w, Comp comp); matches f and then compares a successful match to w using the comparison function comp
Comp is of type bool(*)(SS const & u, SS const & w);
with the first parameter from the successful match, the second from the supplied stringFindGreaterThan (Find const & f, SS const & w); like FindCompare, requiring the successful match to be greater than w FindGreaterThanEqual (Find const & f, SS const & w); like FindCompare, requiring the successful match to be greater than or equal to w FindLessThan (Find const & f, SS const & w); like FindCompare, requiring the successful match to be less than w FindLessThanEqual (Find const & f, SS const & w); like FindCompare, requiring the successful match to be less than or equal to w FindInRange (Find const & f, SS const & w_low, SS const & w_high); find a match for f between w_low and w_high, inclusive FindBool (bool b, int l=0); either always or never a match, with length of l
l may be SS::fullengthFindPosition (int n); matches, with length 1, if the position in the string is n
n may be SS::maxindexFindRange (int n, int l); matches, with length l, if the position in the string is n and there are at least l characters remaining
n may be SS::maxindex and l may be SS::fullengthFindDisplacedBy (Find const & f, int n); matches f, displaced by n characters
n > 0 tries f at n characters forward, i.e. the found position will be n characters before the matchFindMultiple (Find const & f, int n=1); matches f, exactly n times in sequence FindZeroOrMore (Find const & f); matches f as many times as possible, in sequence FindOneOrMore (Find const & f); matches f at least once and as many times as possible, in sequence FindZeroOrOne (Find const & f); matches f either not present or once FindAtLeast (Find const & f, int n); matches f at least n times and as many times as possible, in sequence FindUpTo (Find const & f, int n); matches f at least once and up to n times, in sequence FindAfter (Find const & f0, Find const & f1); matches f0 and then searches for f1, starting at the first character after f0's match.
a successful match includes all the text from both matches and everything in betweenFindWithin (Find const & f0, Find const & f1); matches f0 and then searches for f1 within f0's match and if found f0's match is returned FindSeparatedBy (Find const & f0, Find const & f1, int n); like FindAfterbut the two matches must be separated by exactly n characters FindCloserThan (Find const & f0, Find const & f1, int n); like FindAfterbut the two matches must be separated by less than n characters FindFartherThan (Find const & f0, Find const & f1, int n); like FindAfterbut the two matches must be separated by more than n characters FindDelimit (char c);
FindDelimit (char left, char right);matches the text between a left and a right delimiting character or just a single delimiting character
Predefined Find Objects
These are objects of type static const SS::Find that have been initialized with some useful values. They can be used with any member function or SS::Find constructor requiring an object of type SS::Find const & or SS::Find. See Find Predicates.
FindFunc whitespace matches any whitespace character FindFunc blackspace matches any human readable character FindFunc lowercase matches a-z FindFunc uppercase matches A-Z FindFunc alpha matches a-z and A-Z FindFunc digit matches 0-9 FindFunc alphanumeric matches a-z, A-Z, and 0-9 FindFunc punct matches any blackspace that isn't alphanumeric FindFunc printable matches blackspace plus the space character FindFunc hexdigit matches 0-9, a-f, and A-F FindFunc cntrl matches any control character FindFunc graph same as blackspace FindFunc findtrue always a match, length 0 FindFunc findfalse never matches FindFunc anychar matches any single character FindFunc frontposition matches only at the beginning of the string FindFunc backposition matches only at the last character of the string FindFunc endofline matches if at the character preceding a one or two
character end of line or at last char of the stringFindFunc singlequotedelimit matches the text inside a pair of " FindFunc doublequotedelimit matches the text inside a pair of ' FindFunc parendelimit matches the text between ( and )
Numerical Operations
toX
bool toBool () const;
short toShort () const;
unsigned short toUShort () const;
int toInt () const;
unsigned int toUInt () const;
long toLong () const;
unsigned long toULong () const;
LongLong toLongLong () const;
ULongLong toULongLong () const;
float toFloat () const;
double toDouble () const;
long double toLongDouble () const;convert to a number
base 10 is used, where appropriatetoX (base)
short toShort (int base) const;
unsigned short toUShort (int base) const;
int toInt (int base) const;
unsigned int toUInt (int base) const;
long toLong (int base) const;
unsigned long toULong (int base) const;
LongLong toLongLong (int base) const;
ULongLong toULongLong (int base) const;convert to a number, assuming string is in base toInteger
void toInteger (short & number) const;
void toInteger (unsigned short & number) const;
void toInteger (int & number) const;
void toInteger (unsigned int & number) const;
void toInteger (long & number) const;
void toInteger (unsigned long & number) const;
void toInteger (LongLong & number) const;
void toInteger (ULongLong & number) const;convert to a number
base 10 is used, where appropriatetoInteger (base)
void toInteger (short & number, int base) const;
void toInteger (unsigned short & number, int base) const;
void toInteger (int & number, int base) const;
void toInteger (unsigned int & number, int base) const;
void toInteger (long & number, int base) const;
void toInteger (unsigned long & number, int base) const;
void toInteger (LongLong & number, int base) const;
void toInteger (ULongLong & number, int base) const;convert to a number, assuming string is in base toBase
static SS toBase (short number, int base=10);
static SS toBase (unsigned short number, int base=10);
static SS toBase (int number, int base=10);
static SS toBase (unsigned int number, int base=10);
static SS toBase (long number, int base=10);
static SS toBase (unsigned long number, int base=10);
static SS toBase (LongLong number, int base=10);
static SS toBase (ULongLong number, int base=10);create a string, representing number in base SS toHex () const; convert a string, byte by byte, to hex SS fromHex () const; convert hex digits back into bytes SS outputHex () const; pretty print string in hex long hash () const; produce a hash value for the string static SS sprint (const char * fmt, ...); do a sprintf into a new string
Supported Types
char const * unsigned char const * signed char const * char unsigned char signed char
bool
short unsigned short int unsigned int long unsigned long __int64 unsigned __int64
float double long double
std::string std::vector<char>
template <class T> SS (T const & t); construct a Super String from any supported type template <class T> SS& operator = (T const & t); assign a object of a supported type to a string SS& operator = (SS const & s); copy assignment must be explicitly defined int compare (X x) const; built in compares that can be used for relational operator support void assign (X x); built in assigns that can be used for assignment support template <class T> inline static SS toSS (T const & t); explicitly construct a string from a supported type template <class T> SS const & toType (T & t); convert a string to a supported type
void SSconvert (X x, SS & s); used by template construction, assignment, and concatenation
the string representation of x should be assigned to svoid SSconvertFrom (SS const & s, X& x); used by template conversion to X
the value of s as X should be assigned to xint SScompare (SS const & s, X x); used by the template relational operators
return less than zero, zero, or greater than zero for s less than, equal, or greater than x
template <class T> SS operator + (T const & t); template <class T> SS& operator += (T const & t);
template <class T> inline SS operator + (T const & u, SS const & w);
template <class T> bool operator == (T const & t) const; template <class T> bool operator != (T const & t) const; template <class T> bool operator < (T const & t) const; template <class T> bool operator > (T const & t) const; template <class T> bool operator <= (T const & t) const; template <class T> bool operator >= (T const & t) const;
template <class T> inline bool operator == (T const & u, SS const & w); template <class T> inline bool operator != (T const & u, SS const & w); template <class T> inline bool operator < (T const & u, SS const & w); template <class T> inline bool operator > (T const & u, SS const & w); template <class T> inline bool operator <= (T const & u, SS const & w); template <class T> inline bool operator >= (T const & u, SS const & w);
Bounds Checking and Adjusting
Bounds Checking
All single character accesses, i.e. SS::operator[](int ind) and SS::at(int ind) are checked to verify that ind is non-negative and less than the string's length. This check has proven useful for finding coding errors. The performance hit for this feature is something that benchmarking and optimization needs to address. Bounds Adjusting
Exceptions
will preserve the descriptive message. catch (SS::Error error)
Exception Thrown When or By SS::ErrorBadArg any inappropriate use of SS::maxindex, SS::fullength, SS::allitems, or SS::notfound
see ConstantsSS::ErrorBadState if internal flags are inconsistant SS::ErrorNumberFormat toInteger, toBase, toBool, toInt, toDouble etc. SS::ErrorOutOfRange operator[], at, and dup
see Bounds Checking
some of the SS::Find subclasses that require positive or non-negative integersSS::ErrorOverflow sprint, if a buffer overflow is detected SS::ErrorNotFound findNextString, findPrevString, findNextSubString, and findPrevSubString
Typedefs
typedef Pair<int> BegLen; some methods use a single object for start and length values
also known as a range, see String Operationstypedef std::vector< Pair<int> > BegLenVect; so a single parameter can represent multiple ranges typedef std::vector<int> PosVect; so a single parameter can represent multiple positions typedef __int64 LongLong;
typedef unsigned __int64 ULongLong;64 bit integral type
also known as long long on some platforms
Constants
notfound if the various search methods that return a position fail they will return this value instead maxindex represents the last character in the string in contexts that require an index or starting position fullength represents a length that makes a range as large as possible for an append or a paste operation allitems when given as a value of count all items will be matched/removed/replaced static const char nullchar; has the value '\000'
note that a plain 0 has type intstatic const Pair<int> nomatch; returned by findNextMatch and findPrevMatch when no match is found
Supporting Classes
Buffer (int n=0, SS const & fillvalue=""); create a new string of length n, zero terminated, with a possible fill value Buffer (void const * start, int n); assign start directly to Super String's char* representation; assign n directly to the length
SubSS (SS* s, int beg, int len); a SubSS has a reference to a range in a Super String SS get () const; produce a new string with the value of SubSS's range SS& operator = (SS s); assign s to the given range of a string, mediated by SubSS
Pair (int t0, int t1); construct a pair of ints int _0; data member for t0 int _1; data member for t1
STL Compatibility
char * begin ();
char const * begin () const;return an iterator pointing to the beginning of the string char * end ();
char const * end () const;return an iterator pointing one past the end of the string bool Find::operator () (char c); allows an SS::Find object to be used as an STL predicate
only works for single character compares
see Find Predicatesstd::string toString () const; convert the Super String to std::string std::vector<char> toVector () const; convert the Super String to std::vector<char>
This article was originally published on March 29, 1999