GuidesSuper String

Super String

Last update: 3/29/99

Synopsis: A string class that includes many operations, buffer functionality, numeric conversions, complex searching, and of course char* compatibility.


Contents

Overview what’s it all about
Basic Operations typical string class construction and access
String Operations substrings, cut and paste, and such
Buffer Operations it’s only memory
Searching and Matching finding things
Find Predicates concerning the type SS::Find
Predefined Find Objects whitespace, uppercase, etc.
Numerical Operations formatting to/from numbers
Supported Types types that can be converted and compared
Bounds Checking and Adjusting when it does one of these
Exceptions who throws what and when
Typedefs a few type aliases
Constants some constants
Supporting Classes call them intermediate data types
STL Compatibility mostly iterators
C Compatibility the lowest common denominator

Overview

This class works pretty much like any other string class: feed it string literals, index the characters, let it do the char* memory management. Beyond that there’s basic stuff, more methods, buffer methods, searching, and finally numbers.


Basic Operations

This is standard fare for C++ string classes. The functionality of length, size, empty, operator [], at, c_str, data, and erase is compatible with std::string. The operator [] and at can throw an exception, see Exceptions.

Basic Operations
SS (); construct a Super String with a value of ""
template <class T> SS (T const & t); construct a Super String for any supported type
SS (SS const & s); copy constructor for Super String
SS (void const * v, int n); construct a Super String by copying n bytes starting at v
int length () const;
int size () const;
the number of characters in the string
bool empty () const; returns true if the string is empty; false otherwise
an empty string will have a length of zero and a value of ""
char & operator [] (int ind);
char const & operator [] (int ind) const;
return a reference to the character at position ind
char & at (int ind);
char const & at (int ind) const;
same as operator [] (int)
operator char * ();
operator char const * () const;
returns a pointer to the first char of the string
see C Compatibility for more
char * c_str ();
char const * c_str () const;
char * data ();
char const * data () const;
same as operator char * ()
SS clone () const; return a copy of the string
SS& erase (); erase the string, i.e. set it to ""
return a reference to the erased string


String Operations

These methods deal with accessing or manipulating ranges of characters. Such as:

  • substring – a range of characters that can be copied or assigned to.
  • get – copy a range of characters.
  • set – change some of a string’s characters.
  • cut – remove and return a range of characters.
  • paste – insert some characters.
  • removeRange – delete a range of characters.
  • replaceRange – replace a range of characters with some others.
  • reverse – reverse the order.
  • sort – sort by character value.
  • fill – fill a range with some value.
  • repeat – fill with a string.

They operate on a range of characters. There are three kinds of ranges:

int beg, int len where beg is the starting position, len the number of characters.
Pair<int> beglen with beglen._0 the start and beglen._1 the length. See Supporting Classes
std::vector <Pair<int>> beglen multiple ranges can be operated on.

Some methods can take any of these kinds of ranges as input. Methods that return ranges do so as either Pair<int> or std::vector <Pair<int>> beglen.

firstChar and lastChar can throw an exception, see Exceptions.

String Operations
substring
SubSS sub (int beg, int len);
SS sub (int beg, int len) const;
SubSS operator () (int beg, int len);
SS operator () (int beg, int len) const;
SubSS operator () (Pair<int> const & beglen);
SS operator () (Pair<int> const & beglen) const;
return a substring representing a range of characters
a substring of a non-constant string can be assigned to
see Supporting Classes for more on SS::SubSS
get
SS get (int beg, int len=1) const;
SS getFrom (int beg) const;
SS get (Pair<int> const & beglen) const;
SS get (std::vector< Pair<int> > const & beglen) const;
std::vector<SS>& get (std::vector<SS>& s, std::vector< Pair<int> > const & beglen) const;
copy a range of characters into a new string
getFrom gets all the way to the end of the string
get(std::vector< Pair<int> > const & beglen) concatenates the ranges
get(std::vector<SS>& s, std::vector< Pair<int> > const & beglen),
stores each range as an element of s and returns s
set
SS& set (SS const & s, int pos=0);
SS& set (SS const & s, int pos, int beg, int len);
SS& set (SS const & s, int pos, Pair<int> const & beglen);
SS& set (SS const & s, std::vector<int> const & pos);
SS& set (std::vector<SS> const & s, std::vector<int> const & pos);
assign new values to a range of characters starting at pos
beg, len, and beglen refer to the argument string s
set does not extend the string
the updated string is returned
the last two set s or an element of s at the elements of pos
cut
SS cut (int beg, int len=1);
SS cutFrom (int beg);
SS cut (Pair<int> const & beglen);
SS& cut (SS& s, int beg, int len=1);
SS& cut (SS& s, Pair<int> const & beglen);
std::vector<SS>& cut (std::vector<SS>& s, std::vector< Pair<int> > const & beglen);
delete a range of characters and return them
beg and len or beglen specify the range or ranges to be cut
cutFrom cuts all the way to the end of the string
either a new string or s is returned
the last makes multiple cuts at once
paste
SS& paste (SS const & s, int pos=0);
SS& paste (SS const & s, int pos, int beg, int len);
SS& paste (SS const & s, int pos, Pair<int> const & beglen);
SS& paste (SS const & s, std::vector<int> const & pos);
SS& paste (std::vector<SS> const & s, std::vector<int> const & pos);
insert s before the position pos
beg, len, and beglen refer to the argument string s
a value for pos of the string’s length or SS::fullength results in an append
the updated string is returned
the last two paste s or an element of s at the elements of pos
removeRange
SS& removeRange (int beg=0, int len=fullength);
SS& removeRange (Pair<int> const & beglen);
SS& removeRange (std::vector< Pair<int> > const & beglen);
delete the specified range or ranges of characters
return the updated string
replaceRange
SS& replaceRange (SS const & newseq, int beg=0, int len=fullength);
SS& replaceRange (SS const & newseq, Pair<int> const & beglen);
SS& replaceRange (SS const & newseq, std::vector< Pair<int> > const & beglen);
SS& replaceRange (std::vector<SS> const & s, std::vector< Pair<int> > const & beglen);
replace a range of characters with newseq
beg and len or beglen specify the range or ranges to be replaced
the updated string is returned
the last replaces the corresponding range from beglen with an element of s
reverse
SS& reverse (int beg=0, int len=fullength);
SS& reverse (Pair<int> const & beglen);
SS& reverse (std::vector< Pair<int> > const & beglen);
SS& itemReverse (std::vector< Pair<int> > const & beglen);
SS& tailReverse (int len);
reverse the order of characters within a range or ranges
with multiple ranges each range is considered distinct
itemReverse reverses the ranges considered as distinct strings,
rather than as individual characters
tailReverse reverses the last len characters
return the updated string
sort
SS& sort (int beg=0, int len=fullength);
SS& sort (Pair<int> const & beglen);
SS& sort (std::vector< Pair<int> > const & beglen);
SS& itemSort (std::vector< Pair<int> > const & beglen);
SS& tailSort (int len);
sort the characters within a range or ranges
with multiple ranges each range is considered distinct
itemSort sorts the ranges considered as distinct strings,
rather than as individual characters
tailSort sorts the last len characters
return the updated string
fill
SS& fill (char c, int beg=0, int len=fullength);
SS& fill (char c, Pair<int> const & beglen);
SS& fill (char c, std::vector< Pair<int> > const & beglen);
set each character in the range or ranges to c
return the updated string
repeat
SS& repeat (SS const & s, int beg=0, int len=fullength);
SS& repeat (SS const & s, Pair<int> const & beglen);
SS& repeat (SS const & s, std::vector< Pair<int> > const & beglen);
like fill except with a string argument
does not change the length of the string
SS head (int len) const; return up to the first len characters of the string
SS tail (int len) const; return up to the last len characters of the string
char & firstChar ();
char const & firstChar () const;
return a reference to the first character of the string
char & lastChar ();
char const & lastChar () const;
return a reference to the last character of the string
bool isUpperCase (int pos) const;
bool isLowerCase (int pos) const;
bool isWhiteSpace (int pos) const;
bool isBlackSpace (int pos) const;
bool isAlpha (int pos) const;
bool isDigit (int pos) const;
bool isAlphaNumeric (int pos) const;
bool isPunct (int pos) const;
bool isPrintable (int pos) const;
bool isHexDigit (int pos) const;
bool isCntrl (int pos) const;
bool isGraph (int pos) const;
determine a classification of the character at pos
these are as per clib isspace, isupper, etc.
SS& toLower ();
SS& toLower (int pos);
convert the character at pos or the entire string to lowercase
return the updated string
SS& toUpper ();
SS& toUpper (int pos);
convert the character at pos or the entire string to uppercase
return the updated string
int compare (X x) const; compare string to an object x
see Supported Types
int compareNoCase (SS const & s) const; do a case insensitive compare
int compare (void const * v, int n) const; compare string to the buffer v with a length of n characters
bool compare (Find const & f) const; determine if the entire string is matched by f
see Find Predicates
SS trim () const; remove leading and trailing whitespace
SS dup (int n=1) const; create a string that duplicates the current string n times


Buffer Operations

A buffer is either:

  • a block of memory, ready for some other operation or agent to use.
  • an arbitrary region of memory, on which string operations may be performed in place.

What constitutes a buffer operation is a little loose:

  • some constraint for being well behaved is violated.
  • value semantics are violated.
  • primarily concerned with raw bytes.
  • getting bytes in or out of the string in an unusual or low-level manner.
  • directly accessing a Super String’s representation.

A region (also reference) has a representation that points to an internal part of its parent string. The advantages are that no copying need be done to refer to a part of a string and that the parent may be manipulated via this part of itself, perhaps with simplified logic. A region is a buffer in that there is not an extra null-termination byte. Also a copy of a reference is a reference not a separate string. It is ill-behaved in that it is invalidated when the parent goes out of scope, is deleted, or assigned a different value. Value semantics are violated here.

Note: to create a buffer of, say, size 10 use

SS s (SS::Buffer(10))
. The expression

SS s(10)
will produce the string “10”.

When Super String allocates the memory for its representation it always allocates an additional byte for zero termination. Since a non zero terminated Super String can be easily constructed with these methods generalized code to handle Super Strings should not assume zero termination.

Buffer Operations
SS (Buffer const & b); construct a Super String via an SS::Buffer object
see Supporting Classes
currently implemented by template, see Supported Types
SS::Buffer (int n=0, SS const & fillvalue=””); direct SS (Buffer const & b) to create a string of length n,
possibly filled with fillvalue
SS::Buffer (void const * start, int n); direct SS (Buffer const & b) to assign start and n to the string’s internal representation
see SS::buffer (void const * s, int n)
SS& buffer (int n); change string to hold n characters
current contents are destroyed; buffer is zero terminated
returns the altered string
SS& buffer (void const * v, int n); assign v and n to the string’s internal representation
no memory is allocated or copied
the string will not try to free s when it goes out of scope
the string is not guarantied to be zero terminated
returns the altered string
SS& resize (int n); shrink or expand the string
current contents are preserved
returns the altered string
SS& resizeToNullTerminator (); resize the string, keeping the contents up to the first embedded null
SS getRegion (int beg=0, int len=fullength) const;
SS getRegion (Pair<int> const & beglen) const;
std::vector<SS>& getRegion (std::vector<SS>& s, std::vector< Pair<int> > const & beglen) const;
create a string that is a reference, or region, to a part of its parent
beg and len or beglen is the range to which the region refers
the last creates a region for each range in beglen
char* extract (); take over the memory management for the internal representation
the internal rep will not be deleted when the string goes out of scope
returns the char* internal rep
SS& zero ();
SS& zero (int beg, int len);
fill the string or a range with the null character, ‘00’
template <class T> static inline void zero (T& t); zero out an arbitrary object
use SS::zero() or SS::zero(int,int) for a Super String
the class T should not have any virtual functions
template <class T> static inline void fillObject (T& t, char c); fill an arbitrary object with c
use SS::fill(char,..) for a Super String
the class T should not have any virtual functions
template <class T> static inline SS fromObject (T& t); create a string by copying the bytes from t
SS const & copyTo (void* dst, int n=fullength, int beg=0) const; copy bytes from the string to dst
n is the number of bytes; beg is the starting position in the string
returns the unaltered string
SS & copyFrom (void* src, int n=fullength, int beg=0); copy bytes from src to the string
up to n bytes will be copied to the string starting at beg
the string will not be extended
returns the altered string
SS & assignFrom (void* src, int n); copy n bytes to the string, starting from src
current contents of the string are lost
returns the altered string
template <class T> SS const & copyToObject (T& t, int beg=0) const; copy bytes from the string to an arbitrary object
beg is the starting position in the string
returns the unaltered string
template <class T> SS & copyFromObject (T& t, int beg=0); copy bytes from an arbitrary object into the string
beg is the starting position in the string
the string will not be extended
returns the altered string
template <class T> SS & assignFromObject (T& t); copy the bytes from an arbitrary object into the string
current contents of the string are lost
returns the altered string
SS& become (SS& s); take over the contents of s
returns the altered string
SS& swap (SS& s); swap contents with s
returns the altered string
SS& swap (int pos0, int pos1); swap the values at pos0 and pos1
returns the altered string


Searching and Matching

These are methods that find sequences of characters in a string that either match another string or satisfy some criterion. You can search for the following:

A view of the methods, from a distance:

  • findNext – find the position of the next match.
  • findNextMatch – same, but also return the length of the match.
  • findNextString – likewise, but return the match as a string.
  • find – more elaborate, the other finds are based on this.
  • contains – it does if you can find it once.
  • population – total number of occurrences.
  • remove – deletes all matches.
  • replace – replaces all matches.
  • removeForward – give a maximum on the number of deletions.
  • replaceForward – give a maximum on the number of replacements.
  • match – either verify a match or return all matches.
  • matchForward – give a maximum on the number of matches.
  • tokenize – separate out tokens based a delimiter.

A find always has a starting point and a direction (Next or Prev). Some have additional parameters to restrict the search or return values. Matching can take place at either a single location in a string or over a range of positions. Most matching is directionless; all matches are found and returned.

Many of the parameters take default values. A forward search will start by default at the beginning of the string; a backward the end. Ranges will default to the whole string or the rest of the string. See Constants for more. For an explanation of ranges see String Operations.

Searching Operations
findNext
int findNext (SS const & sequence, int beg=0, SS& result=nullref) const;
int findPrev (SS const & sequence, int beg=maxindex, SS& result=nullref) const;
int findNext (Find const & finder, int beg=0, SS& result=nullref) const;
int findPrev (Find const & finder, int beg=maxindex, SS& result=nullref) const;
int findNextNoCase (SS const & sequence, int beg=0, SS& result=nullref) const;
int findPrevNoCase (SS const & sequence, int beg=maxindex, SS& result=nullref) const;
int findNextOf (SS const & charset, int beg=0, SS& result=nullref) const;
int findPrevOf (SS const & charset, int beg=maxindex, SS& result=nullref) const;
starting at beg, try each position util a match is found and return that position
if no match is found return SS::notfound
the matching characters are returned in result, if one is supplied
the Next methods search forward, the Prev backward
the NoCase versions do a case insensitive compare
findNextOf and findPrevOf search for any character in charset
findNextMatch
Pair<int> findNextMatch (SS const & sequence, int beg=0) const;
Pair<int> findPrevMatch (SS const & sequence, int beg=maxindex) const;
Pair<int> findNextMatch (Find const & finder, int beg=0) const;
Pair<int> findPrevMatch (Find const & finder, int beg=maxindex) const;
Pair<int> findNextNoCaseMatch (SS const & sequence, int beg=0) const;
Pair<int> findPrevNoCaseMatch (SS const & sequence, int beg=maxindex) const;
searching is performed in the same manner as described in findNext
the return value is the range in the string of the match
if no match is found SS::nomatch is returned.
findNextString
SS findNextString (SS const & sequence, int beg=0 ) const;
SS findPrevString (SS const & sequence, int beg=maxindex) const;
SS findNextString (Find const & finder, int beg=0 ) const;
SS findPrevString (Find const & finder, int beg=maxindex) const;
SubSS findNextSubString (SS const & sequence, int beg=0 );
SubSS findPrevSubString (SS const & sequence, int beg=maxindex);
SubSS findNextSubString (Find const & finder, int beg=0 );
SubSS findPrevSubString (Find const & finder, int beg=maxindex);
searching is performed in the same manner as described in findNext
the return value is the string that was found
if no match was found throw SS::ErrorNotFound
the String and SubString versions return Super String and SS::SubSS, respectively
find
int find (Find const & finder, int beg, int* len=0, int end=fullength, int inc=1) const;
int rfind (Find const & finder, int beg, int* len=0, int end=fullength) const;
int find (SS const & sequence, int beg, int* len=0, int end=fullength, int inc=1) const;
int rfind (SS const & sequence, int beg, int* len=0, int end=fullength) const;
beg is the first position checked, end is the last
inc is the stepsize of the search, with rfind using -1
returns SS::notfound if no match is found at the positions tried
the match length is returned in len, if supplied
a value of SS::fullength for end refers to the beginning or end of the string depending upon direction
contains
bool contains (SS const & sequence, int beg=0, int len=fullength) const;
bool contains (Find const & finder, int beg=0, int len=fullength) const;
bool contains (SS const & sequence, Pair<int> const & beglen) const;
bool contains (Find const & finder, Pair<int> const & beglen) const;
bool contains (SS const & sequence, std::vector< Pair<int> > const & beglen) const;
bool contains (Find const & finder, std::vector< Pair<int> > const & beglen) const;
returns true if the sequence or finder is found anywhere in the range or ranges; false otherwise
population
int population (SS const & sequence, int beg=0, int len=fullength) const;
int population (Find const & finder, int beg=0, int len=fullength) const;
int population (SS const & sequence, Pair<int> const & beglen) const;
int population (Find const & finder, Pair<int> const & beglen) const;
int population (SS const & sequence, std::vector< Pair<int> > const & beglen) const;
int population (Find const & finder, std::vector< Pair<int> > const & beglen) const;
counts the number of occurrences of sequence or finder in the range or ranges
returns the total found
remove
SS& remove (SS const & oldseq, int beg=0, int len=fullength);
SS& remove (Find const & finder, int beg=0, int len=fullength);
SS& remove (SS const & oldseq, Pair<int> const & beglen);
SS& remove (Find const & finder, Pair<int> const & beglen);
SS& remove (SS const & oldseq, std::vector< Pair<int> > const & beglen);
SS& remove (Find const & finder, std::vector< Pair<int> > const & beglen);
deletes every occurrence of sequence or finder in the range or ranges
returns the altered string
replace
SS& replace (SS const & oldseq, SS const & newseq, int beg=0, int len=fullength);
SS& replace (Find const & finder, SS const & newseq, int beg=0, int len=fullength);
SS& replace (SS const & oldseq, SS const & newseq, Pair<int> const & beglen);
SS& replace (Find const & finder, SS const & newseq, Pair<int> const & beglen);
SS& replace (SS const & oldseq, SS const & newseq, std::vector< Pair<int> > const & beglen);
SS& replace (Find const & finder, SS const & newseq, std::vector< Pair<int> > const & beglen);
replaces every occurrence of sequence or finder in the range or ranges with newseq
returns the altered string
removeForward
int removeForward (SS const & oldseq, int count=1, int beg=0, int len=fullength);
int removeForward (Find const & finder, int count=1, int beg=0, int len=fullength);
int removeBackward (SS const & oldseq, int count=1, int beg=0, int len=fullength);
int removeBackward (Find const & finder, int count=1, int beg=0, int len=fullength);
directional remove, up to count items are deleted in the range
returns the number of items actually deleted
the Forward version starts at the beginning of the range; the Backward the end
count can be SS::allitems
replaceForward
int replaceForward (SS const & oldseq, SS const newseq, int count=1, int beg=0, int len=fullength);
int replaceForward (Find const & finder, SS const newseq, int count=1, int beg=0, int len=fullength);
int replaceBackward (SS const & oldseq, SS const newseq, int count=1, int beg=0, int len=fullength);
int replaceBackward (Find const & finder, SS const newseq, int count=1, int beg=0, int len=fullength);
just like removeForward except replace matches with newseq
match (at a single position)
bool match (SS const & sequence, int pos, SS& result=nullref) const;
bool match (Find const & finder, int pos, SS& result=nullref) const;
if sequence or finder match at position pos return true; false otherwise
if there is a match return the matched characters in result, if supplied
match (over a range)
std::vector< Pair<int> >& match (SS const & sequence, std::vector< Pair<int> >& beglen, int beg=0, int len=fullength) const;
std::vector< Pair<int> >& match (Find const & finder, std::vector< Pair<int> >& beglen, int beg=0, int len=fullength) const;
std::vector< Pair<int> >& match (SS const & sequence, std::vector< Pair<int> >& beglen, Pair<int> const & beglen_src) const;
std::vector< Pair<int> >& match (Find const & finder, std::vector< Pair<int> >& beglen, Pair<int> const & beglen_src) const;
std::vector< Pair<int> >& match (SS const & sequence, std::vector< Pair<int> >& beglen, std::vector< Pair<int> > const & beglen_src) const;
std::vector< Pair<int> >& match (Find const & finder, std::vector< Pair<int> >& beglen, std::vector< Pair<int> > const & beglen_src) const;
find all the matches in the range or ranges and store them in beglen
the return value is also beglen
if no matches are found beglen will have a size of zero
matchForward
std::vector< Pair<int> >& matchForward (SS const & sequence, std::vector< Pair<int> >& beglen, int count=1, int beg=0, int len=fullength) const;
std::vector< Pair<int> >& matchForward (Find const & finder, std::vector< Pair<int> >& beglen, int count=1, int beg=0, int len=fullength) const;
std::vector< Pair<int> >& matchBackward (SS const & sequence, std::vector< Pair<int> >& beglen, int count=1, int beg=0, int len=fullength) const;
std::vector< Pair<int> >& matchBackward (Find const & finder, std::vector< Pair<int> >& beglen, int count=1, int beg=0, int len=fullength) const;
directional match, up to count items are found in the range
returns the number of items actually found
beglen is as in match
the Forward version starts at the beginning of the range; the Backward the end
count can be SS::allitems
tokenize
std::vector< Pair<int> >& tokenize (std::vector< Pair<int> >& beglen) const;
std::vector< Pair<int> >& tokenize (SS const & delimeter, std::vector< Pair<int> >& beglen) const;
std::vector< Pair<int> >& tokenize (Find const & finder, std::vector< Pair<int> >& beglen) const;
break string into tokens and store starting positions and lengths in beglen
use whitespace, a string, or a SS::Find object to delimitate tokens
returns beglen


Find Predicates

How to use it

These predicates are for use with the various searching and matching functions in Super String. (see Searching and Matching) The predicate objects are of base type of SS::Find may be created and stored or simply created as temporaries when the desired Super String member function is called. They are nested classes of Super String. Predicates that take arguments of type SS::Find const & allow composition of arbitrarily complex predicates. New predicate classes may be subclassed off of SS::Finder. They needn’t be nested classes of Super String and no privileged access is required. There are some predefined find objects to perform a few simple searches.

How it works

A member function using a find predicate will typically examine a range of positions in the string. At each one the predicate’s found method will be invoked:

virtual bool found (SS const & s, int pos, int& len) const;

where s is usually *this and pos is the position for which the match is being checked. If the predicate reports a match it will return the number of characters matched in len.

Predicate objects should be stored and used as objects of SS::Find or as references or pointers to SS::Find. This requirement is due to the mechanism by which predicate objects are copied. An SS::Find object is either the base class portion of a complete object derived from SS::Finder or else it is a complete object that has been sliced to SS::Find. The act of slicing causes the complete object to be cloned and saved as a pointer to SS::Finder in the new SS::Find object.

Additional predicates are derived off SS::Finder. If the predicate contains data that cannot be copied as values it should supply a copy constructor, destructor, and a SS::Finder* clone() const; member. It should also supply a clone if it implements the found virtual function. It is intended that the predicate objects be stateless, i.e. the result of a match not depend upon the result of a previous match, but that’s only due to the potential confusion and unpredictability. Finally, a predicate that can be constructed with a single parameter of type SS::Find const & should have a protected or private copy constructor.

Find Predicates
Find base class of Super String’s find hierarchy
Finder base class for deriving additional Find classes
FindChar (char c); matches if the character at the search position is c and returns the character
FindCharNoCase (char c); like FindChar but case insensitive
FindFunc (Func f); calls f with the character at the search position and, if true, returns that character
Func is of type bool(*)(char c)
FindSet (SS const & charset); matches any character in charset and returns a match
FindString (SS const & str); matches str at the search position and returns it
FindVector (int n=0); a base class to manage a vector of find predicates
FindSequence (F f0, F f1);
FindSequence (F f0, F f1, F f2);
FindSequence (F f0, F f1, F f2, F f3);
FindSequence (F f0, F f1, F f2, F f3, …);
the arguments are all of type SS::Find const &
each match is performed at the character following the previous, and if all match the union is returned
additional components can be added with void add (F f);
FindStringNoCase (SS const & s); does a case insensitive match of s at the search position
FindProxy (Find const & f); base class for find predicates that take a SS::Find const & as an argument
FindNot (Find const & f); matches if f doesn’t match; doesn’t match if f matches
the length of the match is determined by f and may not be meaningful
FindOr (F f0, F f1);
FindOr (F f0, F f1, F f2);
FindOr (F f0, F f1, F f2, F f3);
FindOr (F f0, F f1, F f2, F f3, …);
the arguments are all of type SS::Find const &
all the components are tried and the longest match, if any, is returned
additional components can be added with void add (F f);
FindAnd (F f0, F f1);
FindAnd (F f0, F f1, F f2);
FindAnd (F f0, F f1, F f2, F f3);
FindAnd (F f0, F f1, F f2, F f3, …);
the arguments are all of type SS::Find const &
all the components are tried and if all match, the longest one is returned
additional components can be added with void add (F f);
FindCharCompare (char w, Comp comp); compares the character at the search position to w using the comparison function comp
Comp is of type bool(*)(char u, char w);
with the first parameter from the string, the second from the supplied character
FindCharGreaterThan (char w); like FindCharCompare, requiring the character in the string to be greater than w
FindCharGreaterThanEqual (char w); like FindCharCompare, requiring the character in the string to be greater than or equal to w
FindCharLessThan (char w); like FindCharCompare, requiring the character in the string to be less than w
FindCharLessThanEqual (char w); like FindCharCompare, requiring the character in the string to be less than or equal to w
FindCharInRange (char w_low, char w_high); find a character between w_low and w_high, inclusive
FindCompare (Find const & f, SS const & w, Comp comp); matches f and then compares a successful match to w using the comparison function comp
Comp is of type bool(*)(SS const & u, SS const & w);
with the first parameter from the successful match, the second from the supplied string
FindGreaterThan (Find const & f, SS const & w); like FindCompare, requiring the successful match to be greater than w
FindGreaterThanEqual (Find const & f, SS const & w); like FindCompare, requiring the successful match to be greater than or equal to w
FindLessThan (Find const & f, SS const & w); like FindCompare, requiring the successful match to be less than w
FindLessThanEqual (Find const & f, SS const & w); like FindCompare, requiring the successful match to be less than or equal to w
FindInRange (Find const & f, SS const & w_low, SS const & w_high); find a match for f between w_low and w_high, inclusive
FindBool (bool b, int l=0); either always or never a match, with length of l
l may be SS::fullength
FindPosition (int n); matches, with length 1, if the position in the string is n
n may be SS::maxindex
FindRange (int n, int l); matches, with length l, if the position in the string is n and there are at least l characters remaining
n may be SS::maxindex and l may be SS::fullength
FindDisplacedBy (Find const & f, int n); matches f, displaced by n characters
n > 0 tries f at n characters forward, i.e. the found position will be n characters before the match
FindMultiple (Find const & f, int n=1); matches f, exactly n times in sequence
FindZeroOrMore (Find const & f); matches f as many times as possible, in sequence
FindOneOrMore (Find const & f); matches f at least once and as many times as possible, in sequence
FindZeroOrOne (Find const & f); matches f either not present or once
FindAtLeast (Find const & f, int n); matches f at least n times and as many times as possible, in sequence
FindUpTo (Find const & f, int n); matches f at least once and up to n times, in sequence
FindAfter (Find const & f0, Find const & f1); matches f0 and then searches for f1, starting at the first character after f0’s match.
a successful match includes all the text from both matches and everything in between
FindWithin (Find const & f0, Find const & f1); matches f0 and then searches for f1 within f0’s match and if found f0’s match is returned
FindSeparatedBy (Find const & f0, Find const & f1, int n); like FindAfterbut the two matches must be separated by exactly n characters
FindCloserThan (Find const & f0, Find const & f1, int n); like FindAfterbut the two matches must be separated by less than n characters
FindFartherThan (Find const & f0, Find const & f1, int n); like FindAfterbut the two matches must be separated by more than n characters
FindDelimit (char c);
FindDelimit (char left, char right);
matches the text between a left and a right delimiting character or just a single delimiting character


Predefined Find Objects

These are objects of type static const SS::Find that have been initialized with some useful values. They can be used with any member function or SS::Find constructor requiring an object of type SS::Find const & or SS::Find. See Find Predicates.

Predefined Find Objects
FindFunc whitespace matches any whitespace character
FindFunc blackspace matches any human readable character
FindFunc lowercase matches a-z
FindFunc uppercase matches A-Z
FindFunc alpha matches a-z and A-Z
FindFunc digit matches 0-9
FindFunc alphanumeric matches a-z, A-Z, and 0-9
FindFunc punct matches any blackspace that isn’t alphanumeric
FindFunc printable matches blackspace plus the space character
FindFunc hexdigit matches 0-9, a-f, and A-F
FindFunc cntrl matches any control character
FindFunc graph same as blackspace
FindFunc findtrue always a match, length 0
FindFunc findfalse never matches
FindFunc anychar matches any single character
FindFunc frontposition matches only at the beginning of the string
FindFunc backposition matches only at the last character of the string
FindFunc endofline matches if at the character preceding a one or two
character end of line or at last char of the string
FindFunc singlequotedelimit matches the text inside a pair of "
FindFunc doublequotedelimit matches the text inside a pair of
FindFunc parendelimit matches the text between ( and )


Numerical Operations

Convert a Super String to a number or explicitly convert a number to a Super String. Implicit conversion is via constructor, in base 10. See Supported Types.

  • toX – convert to X.
  • toX (base) – convert to X, assuming some base.
  • toInteger – also convert to X.
  • toInteger (base) – also convert to X, assuming some base.
  • toBase – construct a Super String from a number, given some base.

toX explicitly specifies a type; toInteger has the number supply it. An example of the latter is template use.

Note that the string itself knows nothing about base: two Super Strings created from the same number but different base will not compare as equal.

The above throw SS::ErrorNumberFormat, see Exceptions

Numerical Operations
toX
bool toBool () const;
short toShort () const;
unsigned short toUShort () const;
int toInt () const;
unsigned int toUInt () const;
long toLong () const;
unsigned long toULong () const;
LongLong toLongLong () const;
ULongLong toULongLong () const;
float toFloat () const;
double toDouble () const;
long double toLongDouble () const;
convert to a number
base 10 is used, where appropriate
toX (base)
short toShort (int base) const;
unsigned short toUShort (int base) const;
int toInt (int base) const;
unsigned int toUInt (int base) const;
long toLong (int base) const;
unsigned long toULong (int base) const;
LongLong toLongLong (int base) const;
ULongLong toULongLong (int base) const;
convert to a number, assuming string is in base
toInteger
void toInteger (short & number) const;
void toInteger (unsigned short & number) const;
void toInteger (int & number) const;
void toInteger (unsigned int & number) const;
void toInteger (long & number) const;
void toInteger (unsigned long & number) const;
void toInteger (LongLong & number) const;
void toInteger (ULongLong & number) const;
convert to a number
base 10 is used, where appropriate
toInteger (base)
void toInteger (short & number, int base) const;
void toInteger (unsigned short & number, int base) const;
void toInteger (int & number, int base) const;
void toInteger (unsigned int & number, int base) const;
void toInteger (long & number, int base) const;
void toInteger (unsigned long & number, int base) const;
void toInteger (LongLong & number, int base) const;
void toInteger (ULongLong & number, int base) const;
convert to a number, assuming string is in base
toBase
static SS toBase (short number, int base=10);
static SS toBase (unsigned short number, int base=10);
static SS toBase (int number, int base=10);
static SS toBase (unsigned int number, int base=10);
static SS toBase (long number, int base=10);
static SS toBase (unsigned long number, int base=10);
static SS toBase (LongLong number, int base=10);
static SS toBase (ULongLong number, int base=10);
create a string, representing number in base
SS toHex () const; convert a string, byte by byte, to hex
SS fromHex () const; convert hex digits back into bytes
SS outputHex () const; pretty print string in hex
long hash () const; produce a hash value for the string
static SS sprint (const char * fmt, …); do a sprintf into a new string


Supported Types

A type is considered fully supported if an object of that type can convert to and from a Super String, be assigned to or concatenated with a Super String, and compared to a Super String using any relational operator. This section describes the mechanism to accomplish these operations. Here are the types currently fully supported:

Supported Types
char const * unsigned char const * signed char const *
char unsigned char signed char
bool
short unsigned short
int unsigned int
long unsigned long
__int64 unsigned __int64
float double long double
std::string std::vector<char>

Arbitrary types can be added by means of non-member functions that are called by the appropriate template member function. For example, to allow an object of type X to convert to a Super String supply a SSconvert (X, SS&), rather than X::operator SS().

Member Implementation
template <class T> SS (T const & t); construct a Super String from any supported type
template <class T> SS& operator = (T const & t); assign a object of a supported type to a string
SS& operator = (SS const & s); copy assignment must be explicitly defined
int compare (X x) const; built in compares that can be used for relational operator support
void assign (X x); built in assigns that can be used for assignment support
template <class T> inline static SS toSS (T const & t); explicitly construct a string from a supported type
template <class T> SS const & toType (T & t); convert a string to a supported type

These function implement Super String support for a given type. Additional supported types can be added by the user. It is not required that all these functions be supplied. SSconvert is typically the most useful. Note that a problematic template <class T> SS::operator T() could be defined.

Non-member Implementation
void SSconvert (X x, SS & s); used by template construction, assignment, and concatenation
the string representation of x should be assigned to s
void SSconvertFrom (SS const & s, X& x); used by template conversion to X
the value of s as X should be assigned to x
int SScompare (SS const & s, X x); used by the template relational operators
return less than zero, zero, or greater than zero for s less than, equal, or greater than x

Binary operations with Super String are implemented as member template functions for Super String on the left; as non-member template functions for Super String on the right. Concatenation is performed by converting the operand to Super String and concatenating the strings. Thus, the corresponding SSconvert must be defined.

Member Concatenation
template <class T> SS operator + (T const & t);
template <class T> SS& operator += (T const & t);

Non-member Concatenation
template <class T> inline SS operator + (T const & u, SS const & w);

Relational operators require the definition of the appropriate SScompare. This additional function allows more flexibility and potentially greater performance than converting the operand to string and then comparing as strings. Comparisons are assumed to commute.

Member Relational Operators
template <class T> bool operator == (T const & t) const;
template <class T> bool operator != (T const & t) const;
template <class T> bool operator < (T const & t) const;
template <class T> bool operator > (T const & t) const;
template <class T> bool operator <= (T const & t) const;
template <class T> bool operator >= (T const & t) const;

Non-member Relational Operators
template <class T> inline bool operator == (T const & u, SS const & w);
template <class T> inline bool operator != (T const & u, SS const & w);
template <class T> inline bool operator < (T const & u, SS const & w);
template <class T> inline bool operator > (T const & u, SS const & w);
template <class T> inline bool operator <= (T const & u, SS const & w);
template <class T> inline bool operator >= (T const & u, SS const & w);


Bounds Checking and Adjusting

Bounds Checking

All single character accesses, i.e. SS::operator[](int ind) and SS::at(int ind) are checked to verify that ind is non-negative and less than the string’s length. This check has proven useful for finding coding errors. The performance hit for this feature is something that benchmarking and optimization needs to address.

An out of range single character access is considered definitely to be an error.

Bounds Adjusting

Some of Super String’s methods allow negative lengths and/or starting points that lie outside the string. This is to facilitate usage in algorithms that generate endpoints that are out of range or arbitrarily ordered. Thus it is not necessary to manually truncate and order the range before use. The tradeoff for this feature is reduced error detection.

The range will be normalized into a valid range. A range completely outside of the string is a null operation. Methods that operate on a range include get, set, cut, removeRange, replaceRange, and operator(). A out of range position for paste will result in a prepend or an append.

An out of range multiple character access is considered possibly correct.


Exceptions

Some member functions throw exceptions. The root of the exception hierarchy is SS::Error, so catching it will also catch the rest. A char const * what() const method will give a descriptive error message. The exception can be caught as a reference or a value, i.e.

catch (SS::Error error)
will preserve the descriptive message.

Exceptions
Exception Thrown When or By
SS::ErrorBadArg any inappropriate use of SS::maxindex, SS::fullength, SS::allitems, or SS::notfound
see Constants
SS::ErrorBadState if internal flags are inconsistant
SS::ErrorNumberFormat toInteger, toBase, toBool, toInt, toDouble etc.
SS::ErrorOutOfRange operator[], at, and dup
see Bounds Checking
some of the SS::Find subclasses that require positive or non-negative integers
SS::ErrorOverflow sprint, if a buffer overflow is detected
SS::ErrorNotFound findNextString, findPrevString, findNextSubString, and findPrevSubString


Typedefs

See the STL documentation for information on std::vector<>. See Supporting Classes for Pair<>.

Typedefs
typedef Pair<int> BegLen; some methods use a single object for start and length values
also known as a range, see String Operations
typedef std::vector< Pair<int> > BegLenVect; so a single parameter can represent multiple ranges
typedef std::vector<int> PosVect; so a single parameter can represent multiple positions
typedef __int64 LongLong;
typedef unsigned __int64 ULongLong;
64 bit integral type
also known as long long on some platforms


Constants

Integral constants are currently implemented as enums.

Constants
notfound if the various search methods that return a position fail they will return this value instead
maxindex represents the last character in the string in contexts that require an index or starting position
fullength represents a length that makes a range as large as possible for an append or a paste operation
allitems when given as a value of count all items will be matched/removed/replaced
static const char nullchar; has the value ‘00’
note that a plain 0 has type int
static const Pair<int&gt nomatch; returned by findNextMatch and findPrevMatch when no match is found


Supporting Classes

The nested class SS::Buffer allows the functionality of SS::buffer(int n) and SS::buffer(void const * v, int n) (see Buffer Operations) to be used in a constructor. The latter is needed because SS(void const * v, int n) (see Basic Operations) has copy semantics; the former because SS(int) (see Supported Types) means to convert as a decimal integer. Code that treats characters as ints also discourage such a constructor.

Super String’s buffer constructor SS(SS::Buffer const &) has two flavors: produce a string with a specific length or specify the region of memory for the string. This second flavor produces a string that has not allocated its memory, will not delete it, and with an internal char* representation that is not necessarily zero terminated. This last fact means that Super String’s implementation or code that accesses a Super String through its char* representation cannot, in general, assume zero termination.

The SS::Buffer object is passed to SS(SS::Buffer const &) and is processed as follows:

SS::Buffer
Buffer (int n=0, SS const & fillvalue=””); create a new string of length n, zero terminated, with a possible fill value
Buffer (void const * start, int n); assign start directly to Super String’s char* representation; assign n directly to the length

A substring is either a Super String or else an object that can be assigned to. This assignment is then performed by replacing the characters of the original string, in the given range, with the right hand side of the assignment. Alternately, the substring can be used as a Super String in which case a copy of the range in the original string is generated. SS::SubSS is not a user accessible type. See also String Operations.

SS::SubSS
SubSS (SS* s, int beg, int len); a SubSS has a reference to a range in a Super String
SS get () const; produce a new string with the value of SubSS‘s range
SS& operator = (SS s); assign s to the given range of a string, mediated by SubSS

The class Pair is used by Super String as one way to represent a range (see String Operations). It is an instance of a template class.

Pair<int>
Pair (int t0, int t1); construct a pair of ints
int _0; data member for t0
int _1; data member for t1


STL Compatibility

Super String supports STL random-access iterators. See the STL documentation. See also Supported Types.

STL Compatibility
char * begin ();
char const * begin () const;
return an iterator pointing to the beginning of the string
char * end ();
char const * end () const;
return an iterator pointing one past the end of the string
bool Find::operator () (char c); allows an SS::Find object to be used as an STL predicate
only works for single character compares
see Find Predicates
std::string toString () const; convert the Super String to std::string
std::vector<char> toVector () const; convert the Super String to std::vector<char>


C Compatibility

For purposes of discussion, a C string is a sequence of characters with a zero (null character) at the end. A char* points to the first character. That is to say, C compatibility means the string is something that will work with the various standard library functions and any other code with the same expectations.

We can’t get perfect compatibility here. If user code tries to reallocate the string or truncate by embedding a null the change won’t be reflected in the string object. char* strings compare as pointers, not as values. Taking the address of a string object does not result in the address of a char*. A string object cannot also be an iterator the way a char* can.

How

C string support is implemented via:

char & operator [] (int ind);
char const & operator [] (int ind) const;
Access the string in the same way as indexing a char*
operator char * ();
operator char const * () const;
Produce a char* that can be used in the same way as an equivalent C string char*
terminating null byte if possible place one there

Why

  • Use with legacy C code.
  • Use with non-legacy code that happens to use char*.
  • An efficient way to transfer data among different systems or disparate objects.

As an example of the last, consider that many string classes allow conversion both to and from char*; the data flows easy from one to the next.

Details

Replacement of char* with Super String should be as transparent as possible, including assignment to, return values of, and function calls involving char*. Conversions to both const char* and char* are provided:

  • Legacy (or other) code may have non const correct prototyped functions and assignments.
  • Changing the characters of a char* string is a fairly common operation.
  • The string object can manage memory for a buffer; something else can fill it.

Caveats

Super String has value semantics; C strings have pointer or iterator semantics:

  • Assignment to a string object results in a copy of the char* string.
  • The + operators don’t result in an incremented char*, they append.
  • The ==, !=, >, <, >=, and <= operators compare values not pointers.
  • *, !, , ++, and are not defined. If they were they might not work in the same way as char*

There is no communication between the user of the char* string and the string object:

  • External memory management for the internal representation of the string is not supported. Specifically new, delete, malloc, and free cannot be performed.
  • Embedding a ‘0’ externally will not change the recorded length of the string. However, the length will be changed with regard to any further zero terminated logic.
  • Changing the zero termination byte will have the usual bad results, but only with zero terminated logic.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Latest Posts

Related Stories