Practical C++ Programming, 2nd Edition
By Steve Oualline
http://www.oreilly.com/catalog/cplus2/?CMP=OT16469
O’Reilly & Associates, December 2002
ISBN: 0-596-00419-2
Chapter 16: File Input/Output
I am the heir of all the ages, in the foremost files of time.
–Tennyson
A file is a collection of related data. C++ treats a file as a series of bytes. Many files reside on disk; however, devices such as printers, magnetic tapes, and communication lines are also considered files.
This chapter discusses three different I/O packages. The first is the C++ I/O stream classes. This is the most commonly used I/O system and the one we’ve been using up to now. Next, we examine the raw I/O routines that give us direct access to the low-level I/O. Finally we look at the C I/O system. Although it is somewhat outdated, C I/O calls still appear in old code. Also, in some cases, the C-style I/O routines are superior to the ones provided with C++.
C++ File I/O
C++ file I/O is based on three classes: the istream class for input, the ostream class for output, and the iostream class for input/output. C++ refers to files as streams since it considers them a stream of bytes. Four class variables are automatically created when you start a program. These are listed in Table 16-1.
Variable |
Use |
---|---|
cin |
Console input (standard input) |
cout |
Console output (standard output) |
cerr |
Console error (standard error) |
clog |
Console log |
These variables are defined in the standard include file <iostream>. Normally, std::cin is assigned to the keyboard and std::cout, std::cerr, and std::clog are assigned to the screen. Most operating systems allow you to change these assignments through I/O redirection (see your operating system manual for details).
For example, the command:
my_prog <file.in
runs the program my_prog and assigns std::cin to the file file.in.
When doing I/O to disk files (except through redirection), you must use the file version of the stream classes. These are std::ifstream, std::ofstream, and std::fstream and are defined in the include file <fstream>.
Suppose you want to read a series of 100 numbers from the file numbers.dat. You start by declaring the input file variable:
std::ifstream data_file; // File we are reading the data from
Next you need to tell C++ what disk file to use. This is done through the open member function:
data_file.open("numbers.dat");
Now you can read the file using the same statements you’ve been using to read std::cin:
for (i = 0; i < 100; ++i) {
assert(i >= 0);
assert(i < sizeof(data_array)/sizeof(data_array[0]));
data_file >> data_array[i];
}
Finally you need to tell the I/O system that you are done with the file:
data_file.close( );
Closing the file frees resources that can then be used again by the program.
C++ allows the open call to be combined with the constructor. For example, instead of writing:
std::ifstream data_file; // File we are reading the data from
data_file.open("numbers.dat");
you can write:
std::ifstream data_file("numbers.dat"); // File we are reading the data from
Additionally, the destructor automatically calls close.
But what if the file numbers.dat is missing? How can you tell if there is a problem? The member function bad returns true if there is a problem, and false otherwise. So to test for problems, all you need is:
if (data_file.bad( )) {
std::cerr << "Unable to open numbers.dat\n";
exit (8);
}
A better version of the program for reading numbers is listed in Example 16-1.
Example 16-1:
read/read.cpp/********************************************************
* read -- read in 100 numbers and sum them *
* *
* Usage: *
* read *
* *
* Numbers are in the file "numbers.dat" *
* *
* Warning: No check is made for a file with less than *
* 100 numbers in it. *
********************************************************/
#include <iostream>
#include <fstream>
#include <cstdlib>
int main( )
{
const int DATA_SIZE = 100; // Number of items in the data
int data_array[DATA_SIZE]; // The data
std::ifstream data_file("numbers.dat"); // The input file
int i; // Loop counter
if (data_file.bad( )) {
std::cerr << "Error: Could not open numbers.dat\n";
exit (8);
}
for (i = 0; i < DATA_SIZE; ++i) {
assert(i >= 0);
assert(i < sizeof(data_array)/sizeof(data_array[0]));
data_file >> data_array[i];
}
int total; // Total of the numbers
total = 0;
for (i = 0; i < DATA_SIZE; ++i) {
assert(i >= 0);
assert(i < sizeof(data_array)/sizeof(data_array[0]));
total += data_array[i];
}
std::cout << "Total of all the numbers is " << total << '\n';
return (0);
}
If you want to read a line of data, you need to use the getline function. It is defined as:[1]
std::istream& getline(std::istream& input_file,
std::string& the_string);
std::istream& getline(std::istream& input_file,
std::string& the_string, char delim)
This function reads a line and stores it in a string. The function returns a reference to the input stream. The second form of the function allows you to specify your own end-of-line delimiter. If this is not specified, it defaults to newline (
‘\n’
).Reading C-Style Strings
To read C-style strings, you can use the getline function. (This is an overload version of the getline function discussed in the previous section.) This getline member function is defined as:
std::istream& getline(char *buffer, int len, char delim = ‘\n’)
The parameters to this function are:
- buffer
- A C-style string in which to store the data that has been read.
- len
- Length of the buffer in bytes. The function reads up to len-1 bytes of data into the buffer. (One byte is reserved for the terminating null character \0.) This parameter is usually sizeof(buffer).
- delim
- The character used to signal end-of-line.
This function returns a reference to the input file. The function reads up to and including the end-of-line character (‘\n’). The end-of-line character is not stored in the buffer. (An end-of-string (‘\0’) is store in to terminate the string.)
For example:
char buffer[30];
std::cin.getline(buffer, sizeof(buffer));
Output Files
The functions for output files are similar to input files. For example, the declaration:
std::ofstream out_file("out.dat");
creates a file named out.dat and lets you write to the file using the file variable out_file.
Actually, the constructor can take two additional arguments. The full definition of the output file constructor is:
std::ofstream::ofstream(const char *name, int mode=std::ios::out,
int prot = filebuf::openprot);
The parameters for this function are:
- name
- The name of the file.
- mode
- A set of flags ORed together that determine the open mode. The flag std::ios::out is required for output files. Other flags are listed in Table 16-2. (The std::ios:: prefix is used to indicate the scope of the constant. This operator is discussed in more detail in Chapter 21.)
- prot
- File protection. This is an operating system-dependent value that determines the protection mode for the file. In Unix the protection defaults to 0644 (read/write owner, group read, others read). For MS-DOS/Windows this defaults to 0 (normal file).
Flag |
Meaning |
---|---|
std::ios::app |
Append data to the end of the output file. |
std::ios::ate |
Go to the end of the file when opened. |
std::ios::in |
Open for input (must be supplied to the open member function of |
std::ios::out |
Open file for output (must be supplied to the open member function of |
std::ios::binary |
Binary file (if not present, the file is opened as an ASCII file). See the later section "Binary I/O" for a definition of a binary file. |
std::ios::trunc |
Discard contents of existing file when opening for write. |
std::ios::nocreate |
Fail if the file does not exist. (Output files only. Opening an input file always fails if there is no file.) |
std::ios::noreplace |
Do not overwrite existing file. If a file exists, cause the open to fail. |
For example, the statement:
std::ofstream out_file("data.new", std::ios::out|std::ios::binary|std::ios::nocreate|
std::ios::app);
appends (std::ios::app) binary data (std::ios::binary) to an existing file (std::ios::nocreate) named data.new.
Example 16-2 contains a short function that writes a message to a log file. The first thing the function does is to open the file for output (std::ios::out), appending (std::ios::app), with the writing to start at the end of the file (std::ios::ate). It then writes the message and closes the file (the destructor for out_file performs the close).
This function was designed to be simple, which it is. But also we didn’t care about efficiency, and as a result this function is terribly inefficient. The problem is that we open and close the file every time we call log_message. Opening a file is an expensive operation, and things would go much faster if we opened the file only once and remembered that we had it open in subsequent calls.
Example 16-2:
log/log.cpp#include <iostream>
#include <fstream>
void log_message(const string& msg)
{
std::ofstream out_file("data.log",
std::ios::out|std::ios::app|std::ios::ate);if (out_file.bad( ))
return; /* Where do we log an error if there is no log */
out_file << msg << endl;
}
Conversion Routines
So far we have just considered writing characters and strings. In this section, we consider some of the more sophisticated I/O operations: conversions.
To write a number to a printer or terminal, you must convert the number to characters. The printer understands only characters, not numbers. For example, the number 567 must be converted to the three characters "5", "6", and "7" to be printed.
The << operator is used to convert data to characters and put them in a file. This function is extremely flexible. It can convert a simple integer into a fixed- or variable-size string as a hex, octal, or decimal number with left or right justification. So far you’ve been using the default conversion for your output. It serves pretty well, but if you want to control your output exactly, you need to learn about conversion flags.
The member functions
setf and
unsetf are used to set and clear the flags that control the conversion process. The general form of the functions is:
file_var.setf(flags); // Set flags
file_var.unsetf(flags); // Clear flags
Table 16-3 lists the various flags and their meanings.
Flag |
Meaning |
---|---|
std::ios::skipws |
Skip leading whitespace characters on input. |
std::ios::left |
Output is left justified. |
std::ios::right |
Output is right justified. |
std::ios::internal |
Numeric output is padded by inserting a fill character between the sign or base character and the number itself. |
std::ios::boolalpha |
Use the character version of true and false ("true", "false") for input and output. |
std::ios::dec |
Output numbers in base 10, decimal format. |
std::ios::oct |
Output numbers in base 8, octal format. |
std::ios::hex |
Output numbers in base 16, hexadecimal format. |
std::ios::showbase |
Print out a base indicator at the beginning of each number. For example, hexadecimal numbers are preceded with "0x". |
std::ios::showpoint |
Show a decimal point for all floating-point numbers whether or not it’s needed. |
std::ios::uppercase |
When converting hexadecimal numbers, show the digits A-F as uppercase. |
std::ios::showpos |
Put a plus sign before all positive numbers. |
std::ios::scientific |
Convert all floating-point numbers to scientific notation on output. |
std::ios::fixed |
Convert all floating-point numbers to fixed point on output. |
std::ios::unitbuf |
Buffer output. (More on this later.) |
If you want to output a number in hexadecimal format, all you have to do is this:
number = 0x3FF;
std::cout << "Dec: " << number << ‘\n’;
std::cout.setf(std::ios::hex);
std::cout << "Hex: " << number << ‘\n’;
std::cout.setf(std::ios::dec);
When run, this program produces the output:
Dec: 1023
Hex: 3ff
TIP: People normally expect the output mode to be decimal, so it is a good idea to reset the mode after each output to avoid later confusion.
When converting numbers to characters, the member function:
int file_var.width(int size);
determines the minimum characters to use. For example, the number 3 would normally convert to the character string "3" (note the lack of spaces). If the width is set to four, the result would be "3" where represents a single space.
The member function:
int file_var.precision(int digits);
controls how many digits are printed after the decimal point.
Finally, the function:
char file_var.fill(char pad);
determines the fill character. This character is used for padding when a number is smaller than the specified width.
TIP:
Some of these flags and parameters are reset after each output call and some are not. Which flags are permanent and which are temporary seems to change from compiler to compiler. In general, don’t assume anything is going to remain set and you’ll be okay. (Just because you’re paranoid doesn’t mean the compiler isn’t out to get you.)These functions can be called directly, or you can use an I/O manipulator. An I/O manipulator is a special function that can be used in an I/O statement to change the formatting. You can think of a manipulator as a magic bullet that, when sent through an input or output file, changes the state of the file. A manipulator doesn’t cause any output; it just changes the state. For example, the manipulator hex changes the output conversion to hexadecimal.
#include <iostream>
number = 0x3FF;
std::cout << "Number is " << std::hex << number << std::dec << ‘\n’;
The header file <iostream> defines a basic set of manipulators. Table 16-4 contains a list of these manipulators.
Manipulator |
Description |
---|---|
std::dec |
Output numbers in decimal format. |
std::hex |
Output numbers in hexadecimal format. |
std::oct |
Output numbers in octal format. |
std::ws |
Skip whitespace on input. |
std::endl |
Output end-of-line |
std::ends |
Output end-of-string (`\0′). |
std::flush |
Force any buffered output out. (See Chapter 17, for an explanation of how to use this function). |
The more advanced set of manipulators (see Table 16-5) is defined in the header file <iomanip>.
Manipulator |
Description |
---|---|
std::setiosflags(long flags) |
Set selected conversion flags. |
std::resetiosflags(long flags) |
Reset selected flags. |
std::setbase(int base) |
Set conversion base to 8, 10, or 16. Sort of a generalized |
std::setw(int width) |
Set the width of the output. |
std::setprecision(int precision) |
Set the precision of floating-point output. |
std::setfill(char ch) |
Set the fill character. |
Example 16-3 shows how some of the I/O manipulators may be used.
Example 16-3:
io/io.cpp
#include <iostream>
#include <iomanip>
int main( )
{
int number = 12; // A number to output
float real = 12.34; // A real number
std::cout << "123456789012345678901234567890\n"; // output ruler
std::cout << number << "<-\n";
std::cout << std::setw(5) << number << "<-\n";
std::cout << std::setw(5) << std::setfill('*') <<
number << "<-\n";
std::cout << std::setiosflags(std::ios::showpos|std::ios::left) <<
std::setw(5) << number << "<-\n";
std::cout << real << "<-\n";
std::cout << std::setprecision(1) <<
std::setiosflags(std::ios::fixed) << real << "<-\n";
std::cout << std::setiosflags(std::ios::scientific) << real << "<-\n";
return (0);
}
The output of this program is:
123456789012345678901234567890
12<-
12<-
***12<-
+12**<-
12.34<-
12.3<-
1e+01<-
Binary and ASCII Files
So far we have limited ourselves to ASCII files. "ASCII" stands for American Standard Code for Information Interchange. It is a set of 95 printable characters and 33 control codes. (A complete list of ASCII codes can be found in Appendix A.) ASCII files are human-readable. When you write a program, the prog.cc file is ASCII.
Terminals, keyboards, and printers deal with character data. When you want to write a number like 1234 to the screen, it must be converted to four characters ("1", "2", "3", and "4") and written.
Similarly, when you read a number from the keyboard, the data must be converted from characters to integers. This is done by the >> operator.
The ASCII character "0" has the value 48, "1" the value 49, and so on. When you want to convert a single digit from ASCII to integer, you must subtract 48:
int integer;
char ch;
ch = ‘5’;
integer = ch – 48;
std::cout << "Integer " << integer << ‘\n’;
Rather than remember that the character "0" is 48, you can just subtract ‘0’:
integer = ch – ‘0’;
Computers work on binary data. When reading numbers from an ASCII file, the program must process the character data through a conversion routine like the integer conversion routine just defined. This is expensive. Binary files require no conversion. They also generally take up less space than ASCII files. The drawback is that they cannot be directly printed on a terminal or printer. (If you’ve ever seen a long printout coming out of the printer displaying pages with a few characters at the top that look like "!E#(@$%@^Aa^AA^^JHC%^X", you know what happens when you try to print a binary file.)
ASCII files are portable (for the most part). They can be moved from machine to machine with very little trouble. Binary files are almost certainly nonportable. Unless you are an expert programmer, it is almost impossible to make a portable binary file.
Which file type should you use? In most cases, ASCII is best. If you have small to medium amounts of data, the conversion time does not seriously affect the performance of your program. (Who cares if it takes 0.5 seconds to start up instead of 0.3?) ASCII files also make it easy to verify the data.
Only when you are using large amounts of data will the space and performance problems force you to use the binary format.
The End-of-Line Puzzle
Back in the dark ages BC (Before Computers), there existed a magical device called a Teletype Model 33. This amazing machine contained a shift register made out of a motor and a rotor as well as a keyboard ROM consisting solely of levers and springs.
The Teletype contained a keyboard, printer, and paper tape reader/punch. It could transmit messages over telephones using a modem at the blazing rate of 10 characters per second.
But Teletype had a problem. It took 0.2 seconds to move the printhead from the right side to the left. 0.2 seconds is two character times. If a second character came while the printhead was in the middle of a return, that character was lost.
The Teletype people solved this problem by making end-of-line two characters: <carriage return> to position the printhead at the left margin, and <line feed> to move the paper up one line. That way the <line feed> "printed" while the printhead was racing back to the left margin.
When the early computers came out, some designers realized that using two characters for end-of-line wasted storage (at this time storage was very expensive). Some picked <line feed> for their end-of-line, and some chose <carriage return>. Some of the die-hards stayed with the two-character sequence.
Unix uses <line feed> for end-of-line. The newline character \n is code 0xA (LF or <line feed>).
MS-DOS/Windows uses the two characters <carriage return><line feed>. Compiler designers had problems dealing with the old C programs that thought newline was just <line feed>. The solution was to add code to the I/O library that stripped out the <carriage return> characters from ASCII input files and changed <line feed> to <carriage return><line feed> on output.
In MS-DOS/Windows, whether or not a file is opened as ASCII or binary is important to note. The flag std::ios::binary is used to indicate a binary file:
// Open ASCII file for reading
ascii_file.open("name", std::ios::in);
// Open binary file for reading
binary_file.open("name", std::ios::in|std::ios::binary);
Unix programmers don’t have to worry about the C++ library automatically fixing their ASCII files. In Unix, a file is a file, and ASCII is no different from binary. In fact, you can write a half-ASCII/half-binary file if you want to.
The member function put can be used to write out a single byte of a binary file. The following program (shown in Example 16-4) writes numbers 0 to 127 to a file called test.out. It works just fine in Unix, creating a 128-byte long file; however, in MS-DOS/Windows, the file contains 129 bytes. Why?
Example 16-4:
wbin/wbin.cpp#include <iostream>
#include <fstream>
#include <cstdlib>
int main( )
{
int cur_char; // current character to write
std::ofstream out_file; // output file
out_file.open("test.out", std::ios::out);
if (out_file.bad( )) {
(std::cerr << "Can not open output file\n");
exit (8);
}
for (cur_char = 0; cur_char < 128; ++cur_char) {
out_file << cur_char;
}
return (0);
}
Hint: Here is a hex dump of the MS-DOS/Windows file:
000:0001 0203 0405 0607 0809 0d0a 0b0c 0d0e
010:0f10 1112 1314 1516 1718 191a 1b1c 1d1e
020:1f20 2122 2324 2526 2728 292a 2b2c 2d2e
030:2f30 3132 3334 3536 3738 393a 3b3c 3d3e
040:3f40 4142 4344 4546 4748 494a 4b4c 4d4e
050:4f50 5152 5354 5556 5758 595a 5b5c 5d5e
060:5f60 6162 6364 6566 6768 696a 6b6c 6d6e
070:6f70 7172 7374 7576 7778 797a 7b7c 7d7e
080:7f
Binary I/O
Binary I/O is accomplished through two member functions: read and write. The syntax for read is:
in_file.read(data_ptr, size);
- data_ptr
- Pointer to a place to put the data.
- size
- Number of bytes to be read.
The member function gcount returns the number of bytes gotten by the last read. This may be less than the number of bytes requested. For example, the read might encounter an end-of-file or error:
struct {
int width;
int height;
} rectangle;
in_file.read(static_cast<char *>(&rectangle), sizeof(rectangle));
if (in_file.bad( )) {
cerr << "Unable to read rectangle\n";
exit (8);
}
if (in_file.gcount( ) != sizeof(rectangle)) {
cerr << "Error: Unable to read full rectangle\n";
cerr << "I/O error of EOF encountered\n";
}
In this example you are reading in the structure rectangle. The
&
operator makesrectangle
into a pointer. The cast static_cast<char *> is needed since read wants a character array. The sizeof operator is used to determine how many bytes to read as well as to check thatread
was successful.The member function write has a calling sequence similar to read:
out_file.write(data_ptr, size);
Buffering Problems
Buffered I/O does not write immediately to the file. Instead, the data is kept in a buffer until there is enough for a big write, or until the buffer is flushed. The following program is designed to print a progress message as each section is finished.
std::cout << "Starting";
do_step_1( );
std::cout << "Step 1 complete";
do_step_2( );
std::cout << "Step 2 complete";
do_step_3( );
std::cout << "Step 3 complete\n";
Instead of writing the messages as each step completes, std::cout puts them in a buffer. Only after the program is finished does the buffer get flushed, and all the messages come spilling out at once.
The I/O manipulator std::flush forces the flushing of the buffers. Properly written, the above example should be:
std::cout << "Starting" << std::flush;
do_step_1( );
std::cout << "Step 1 complete" << std::flush;
do_step_2( );
std::cout << "Step 2 complete" << std::flush;
do_step_3( );
std::cout << "Step 3 complete\n" << std::flush;
Because each output statement ends with a std::flush, the output is displayed immediately. This means that our progress messages come out on time.
TIP:
The C++ I/O classes buffer all output. Output to std::cout and std::cerr is line buffered. In other words, each newline forces a buffer flush. Also, C++ is smart enough to know that std::cout and std::cerr are related to std::cin and will automatically flush these two output streams just before reading std::cin. This makes it possible to write prompts without having to worry about buffering:NOTE: std::cout << "Enter a value: "; // Note: No flush
std::cin >> value;Unbuffered I/O
In buffered I/O, data is buffered and then sent to the file. In unbuffered I/O, the data is immediately sent to the file.
If you drop a number of paperclips on the floor, you can pick them up in buffered or unbuffered mode. In buffered mode, you use your right hand to pick up a paper clip and transfer it to your left hand. The process is repeated until your left hand is full, then you dump a handful of paperclips into the box on your desk.
In unbuffered mode, you pick up a paperclip and dump it into the box. There is no left-hand buffer.
In most cases, buffered I/O should be used instead of unbuffered. In unbuffered I/O, each read or write requires a system call. Any call to the operating system is expensive. Buffered I/O minimizes these calls.
Unbuffered I/O should be used only when reading or writing large amounts of binary data or when direct control of a device or file is required.
Back to the paperclip example–if you were picking up small items like paperclips, you would probably use a left-hand buffer. But if you were picking up cannon balls (which are much larger), no buffer would be used.
The open system call is used for opening an unbuffered file. The macro definitions used by this call differ from system to system. Since the examples have to work for both Unix and MS-DOS/Windows, conditional compilation
(
#ifdef/
#endif)
is used to bring in the correct files:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#ifdef _ _MSDOS_ _ // If we are MS-DOS
#include <io.h> // Get the MS-DOS include file for raw I/O
#else /* _ _MSDOS_ _ */
#include <unistd.h> // Get the Unix include file for raw I/O
#endif /* _ _MSDOS_ _ */
The syntax for an
open
call is:
int file_descriptor = open(name, flags); // Existing file
file_descriptor = open(name, flags, mode);//New file
- file_descriptor
- An integer that is used to identify the file for the read, write, and close calls. If
file_descriptor
is less than 0, an error occurred. - name
- Name of the file.
- flags
- Defined in the fcntl.h header file. Open flags are described in Table 16-6.
- mode
- Protection mode for the file. Normally this is 0644.
Flag |
Meaning |
---|---|
O_RDONLY |
Open for reading only. |
O_WRONLY |
Open for writing only. |
O_RDWR |
Open for reading and writing. |
O_APPEND |
Append new data at the end of the file. |
O_CREAT |
Create file (the file mode parameter required when this flag is present). |
O_TRUNC |
If the file exists, truncate it to 0 length. |
O_EXCL |
Fail if file exists. |
O_BINARY |
Open in binary mode (older Unix systems may not have this flag). |
For example, to open the existing file data.txt in text mode for reading, you use the following:
data_fd = open("data.txt", O_RDONLY);
The next example shows how to create a file called output.dat for writing only:
out_fd = open("output.dat", O_CREAT|O_WRONLY, 0666);
Notice that you combined flags using the OR (|) operator. This is a quick and easy way of merging multiple flags.
When any program is initially run, three files are already opened. These are described in Table 16-7.
File number |
Description |
---|---|
0 |
Standard in |
1 |
Standard out |
2 |
Standard error |
The format of the read call is:
read_size = read(file_descriptor, buffer, size);
- read_size
- The actual number of bytes read. A 0 indicates end-of-file, and a negative number indicates an error.
- file_descriptor
- File descriptor of an open file.
- buffer
- Pointer to a place to put the data that is read from the file.
- size
- Size of the data to be read. This is the size of the request. The actual number of bytes read may be less than this. (For example, you may run out of data.)
The format of a write call is:
write_size = write(file_descriptor, buffer, size);
- write_size
- Actual number of bytes written. A negative number indicates an error.
- file_descriptor
- File descriptor of an open file.
- buffer
- Pointer to the data to be written.
- size
- Size of the data to be written. The system will try to write this many bytes, but if the device is full or there is some other problem, a smaller number of bytes may be written.
Finally, the close call closes the file:
flag = close(file_descriptor)
- flag
- 0 for success, negative for error.
- file_descriptor
- File descriptor of an open file.
Example 16-5 copies a file. Unbuffered I/O is used because of the large buffer size. It makes no sense to use buffered I/O to read 1K of data into a buffer (using an std::ifstream) and then transfer it into a 16K buffer.
Example 16-5:
copy2/copy2.cpp
/****************************************
* copy -- copy one file to another. *
* *
* Usage *
* copy <from> <to> *
* *
* <from> -- the file to copy from *
* <to> -- the file to copy into *
****************************************/
#include <iostream>
#include <cstdlib>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#ifdef _ _WIN32_ _ // if we are Windows32
#include <io.h> // Get the Windows32 include file for raw i/o
#else /* _ _WIN32_ _ */
#include <unistd.h> // Get the Unix include file for raw i/o
#endif /* _ _WIN32_ _ */
const int BUFFER_SIZE = (16 * 1024); // use 16k buffers
int main(int argc, char *argv[])
{
char buffer[BUFFER_SIZE]; // buffer for data
int in_file; // input file descriptor
int out_file; // output file descriptor
int read_size; // number of bytes on last read
if (argc != 3) {
std::cerr << "Error:Wrong number of arguments\n";
std::cerr << "Usage is: copy <from> <to>\n";
exit(8);
}
in_file = open(argv[1], O_RDONLY);
if (in_file < 0) {
std::cerr << "Error:Unable to open " << argv[1] << '\n';
exit(8);
}
out_file = open(argv[2], O_WRONLY | O_TRUNC | O_CREAT, 0666);
if (out_file < 0) {
std::cerr << "Error:Unable to open " << argv[2] << '\n';
exit(8);
}
while (true) {
read_size = read(in_file, buffer, sizeof(buffer));
if (read_size == 0)
break; // end of file
if (read_size < 0) {
std::cerr << "Error:Read error\n";
exit(8);
}
write(out_file, buffer, (unsigned int) read_size);
}
close(in_file);
close(out_file);
return (0);
}
Several things should be noted about this program. First of all, the buffer size is defined as a constant, so it is easily modified. Rather than have to remember that 16K is 16,384, the programmer used the expression (16 * 1024)
. This form of the constant is obviously 16K.
If the user improperly uses the program, an error message results. To help the user get it right, the message tells how to use the program.
You may not read a full buffer for the last read. That is why read_size is used to determine the number of bytes to write.
Designing File Formats
Suppose you are designing a program to produce a graph. The height, width, limits, and scales are to be defined in a graph configuration file. You are also assigned to write a user-friendly program that asks the operator questions and writes a configuration file so he or she does not have to learn the text editor. How should you design a configuration file?
One way would be as follows:
height (in inches)
width (in inches)
x lower limit
x upper limit
y lower limit
y upper limit
x-scale
y-scale
A typical plotter configuration file might look like:
10.0
7.0
0
100
30
300
0.5
2.0
This file does contain all the data, but in looking at it, you have trouble identifying what, for example, is the value of the Y lower limit. A solution is to comment the file so the configuration program writes out not only the data, but also a string describing the data.
10.0 height (in inches)
7.0 width (in inches)
0 x lower limit
100 x upper limit
30 y lower limit
300 y upper limit
0.5 x-scale
2.0 y-scale
Now the file is human-readable. But suppose a user runs the plot program and types in the wrong filename, and the program gets the lunch menu for today instead of a plot configuration file. The program is probably going to get very upset when it tries to construct a plot whose dimensions are "BLT on white" versus "Meatloaf and gravy."
The result is that you wind up with egg on your face. There should be some way of identifying a file as a plot configuration file. One method of doing this is to put the words "Plot Configuration File" on the first line of the file. Then, when someone tries to give your program the wrong file, the program will print an error message.
This takes care of the wrong file problem, but what happens when you are asked to enhance the program and add optional logarithmic plotting? You could simply add another line to the configuration file, but what about all those old files? It’s not reasonable to ask everyone to throw them away. The best thing to do (from a user’s point of view) is to accept old format files. You can make this easier by putting a version number in the file.
A typical file now looks like:
Plot Configuration File V1.0
log Logarithmic or normal plot
10.0 height (in inches)
7.0 width (in inches)
0 x lower limit
100 x upper limit
30 y lower limit
300 y upper limit
0.5 x-scale
2.0 y-scale
In binary files, it is common practice to put an identification number in the first four bytes of the file. This is called the magic number. The magic number should be different for each type of file.
One method for choosing a magic number is to start with the first four letters of the program name (e.g., list) and convert them to hex: 0x6c607374. Then add 0x80808080 to the number: 0xECE0F3F4.
This generates a magic number that is probably unique. The high bit is set on each byte to make the byte non-ASCII and avoid confusion between ASCII and binary files. On most Unix systems and Linux, you’ll find a file called /etc/magic, which contains information on other magic numbers used by various programs.
When reading and writing a binary file containing many different types of structures, it is easy to get lost. For example, you might read a name structure when you expected a size structure. This is usually not detected until later in the program. To locate this problem early, you can put magic numbers at the beginning of each structure. Then if the program reads the name structure and the magic number is not correct, it knows something is wrong.
Magic numbers for structures do not need to have the high bit set on each byte. Making the magic number just four ASCII characters makes it easy to pick out the beginning of structures in a file dump.
C-Style I/O Routines
C++ allows you to use the C I/O library in C++ programs. Many times this occurs because someone took a C program, translated it to C++, and didn’t want to bother translating the I/O calls. In some cases, the old C library is better and easier to use than the new C++ library. For example, C string-conversion routines such as std::sscanf and std::sprintf use a far more compact formatting specification system than their C++ counterparts. (Note that it is a matter of taste whether or not compact is better.)
The declarations for the structures and functions used by the C I/O functions are stored in the standard include file <cstdio>.
The declaration for a file variable is:
std::FILE *file_variable
;
/*
Comment*/
For example:
#include <cstdio>
std::FILE *in_file; /* File containing the input data */
Before a file can be used, it must be opened using the function std::fopen. std::fopen returns a pointer to the file structure for the file. The format for std::fopen is:
file_variable = std::fopen(name, mode);
- file_variable
- A file variable.
- name
- Actual name of the file ("data.txt", "temp.dat", etc.).
- mode
- Indicates whether the file is to be read or written. Mode is "w" for writing and "r" for reading.
The function std::fclose closes the file. The format of std::fclose is:
status = std::fclose(file_variable);
The variable status will be zero if the std::fclose was successful or nonzero for an error.
C provides three preopened files. These are listed in Table 16-8.
File |
Description |
---|---|
stdin |
Standard input (open for reading). Equivalent to C++’s cin. |
stdout |
Standard output (open for writing). Equivalent to C++’s cout. |
stderr |
Standard error (open for writing). Equivalent to C++’s cerr. |
|
(There is no C file equivalent to C++’s clog.) |
The function std::fgetc reads a single character from a file. If there is no more data in the file, the function returns the constant EOF (EOF is defined in cstdio). Note that std::fgetc returns an integer, not a character. This is necessary because the EOF flag must be a noncharacter value.
Example 16-6 counts the number of characters in the file input.txt.
Example 16-6:
copy/copy.cpp
#include <cstdio>
#include <cstdlib> /* ANSI Standard C file */
#include <iostream>
const char FILE_NAME[] = "input.txt"; // Name of the input file
int main( )
{
int count = 0; // number of characters seen
std::FILE *in_file; // input file
int ch; // character or EOF flag from input
in_file = std::fopen(FILE_NAME, "rb");
if (in_file == NULL) {
std::cerr << "Can not open " << FILE_NAME << '\n';
exit(8);
}
while (true) {
ch = std::fgetc(in_file);
if (ch == EOF)
break;
++count;
}
std::cout << "Number of characters in " << FILE_NAME <<
" is " << count << '\n';
std::fclose(in_file);
return (0);
}
A similar function, std::fputc, exists for writing a single character. Its format is:
std::fputc(character
,
file);
The functions std::fgets and std::fputs work on one line at a time. The format of the std::fgets call is:
line_ptr = std::fgets(line, size, file);
- line_ptr
- Equal to line if the read was successful, or NULL if EOF or an error is detected.
- line
- A character array where the function places the line.
- size
- The size of the character array. std::fgets reads until it gets a line (complete with ending \n) or it reads size – 1 characters. It then ends the string with a null (\0).
For example:
char line[100];
. . .
std::fgets(line, sizeof(line), in_file);
std::fputs is similar to std::fgets except that it writes a line instead of reading one. The format of the std::fputs function is:
line_ptr = std::fputs(line, file);
The parameters to std::fputs are similar to the ones for std::fgets. std::fputs needs no size because it gets the size of the line to write from the length of the line. (It keeps writing until it hits a null character, ‘\0‘).
TIP:
The C++ function getline reads and discards the end-of-line character (‘\n’). The C std::fgets reads the entire line, including the end-of-line and stores it in the buffer. So the ‘\n’ is put in the buffer when you use std::fgets. This can sometimes cause surprising results.C-Style Conversion Routines
C++ uses the << operator for formatted output and the >> operator for formatted input. C has its own set of output functions (the pstd::printf family) and input conversion functions (the std::scanf functions). This section goes into the details of these C-style conversion routines.
The std::printf Family of Output Functions
C uses the std::printf function call and related functions for output. A std::printf call consists of two parts: a format that describes how to print the data and a list of data to print.
The general form of the std::printf call is:
std::printf(format, parameter-1, parameter-2, …);
The format string is printed exactly. For example:
std::printf("Hello World\n");
prints:
Hello World
To print a number, you must put a % conversion in the format string. For example, when C sees %d in the format string, it takes the next parameter from the parameter list (which must be an integer) and prints it.
Figure 16-1 shows how the elements of the std::printf statement work to generate the final result.
|
The conversion %d is used for integers. Other types of parameters use different conversions. For example, if you want to print a floating-point number, you need a %f conversion. Table 16-9 lists the conversions.
Conversion |
Variable type |
---|---|
%d |
int |
%ld |
long int |
%d |
short int |
%f |
float |
%lf |
double |
%u |
unsigned int |
%lu |
unsigned long int |
%u |
unsigned short int |
%s |
char * (C-style string) |
%c |
char |
%o |
int (prints octal) |
%x |
int (prints hexadecimal) |
%e |
float (in the form d.dddE+dd) |
Many additional conversions also can be used in the std::printf statement. See your reference manual for details.
The std::printf function does not check for the correct number of parameters on each line. If you add too many, the extra parameters are ignored. If you add too few, C will make up values for the missing parameters. Also C does not type check parameters, so if you use a %d
on a floating point number, you will get strange results.
Why does 2 + 2 = 5986? (Your results may vary.)
Example 16-7:
two/two.c
#include <cstdio>
int main( )
{
int answer;
answer = 2 + 2;
std::printf("The answer is %d\n");
return (0);
}
Why does 21 / 7 = 0? (Your results may vary.)
Example 16-8:
float3/float3.c
#include <cstdio>
int main( )
{
float result;
result = 21.0 / 7.0;
std::printf("The result is %d\n", result);
return (0);
}
The function std::fprintf is similar to std::printf except that it takes one additional argument, the file to print to:
std::fprintf(file, format, parameter-1, parameter-2, …);
Another flavor of the std::printf family is the std::sprintf call. The first parameter of std::sprintf is a C-style string. The function formats the output and stores the result in the given string:
std::sprintf(string, format, parameter-1, parameter-2, …);
For example:
char file_name[40]; /* The filename */
/* Current file number for this segment */
int file_number = 0;
std::sprintf(file_name, "file.%d", file_number);
++file_number;
out_file = std::fopen(file_name, "w");
WARNING:
The return value ofstd::sprintf
differs from system to system. The ANSI standard defines it as the number of characters stored in the string; however, some implementations of Unix C define it to be a pointer to the string.The std::scanf Family of Input Functions
Reading is accomplished through the std::scanf family of calls. The std::scanf function is similar to std::printf in that it has sister functions: std::fscanf and std::sscanf. The std::scanf function reads the standard input (stdin in C terms, cin in C++ terms), parses the input, and stores the results in the parameters in the parameter list.
The format for a scanf function call is:
number = scanf(format, ¶meter1, . . .);
- number
- Number of parameters successfully converted.
- format
- Describes the data to be read.
- parameter1
- First parameter to be read. Note the & in front of the parameter. These parameters must be passed by address.
WARNING:
If you forget to put & in front of each variable for std::scanf, the result can be a "Segmentation violation core dumped" or "Illegal memory access" error. In some cases a random variable or instruction will be modified. This is not common on Unix machines, but MS-DOS/Windows, with its lack of memory protection, cannot easily detect this problem. In MS-DOS/Windows, omitting & can cause a system crash.
There is one problem with this std::scanf: it’s next to impossible to get the end-of-line handling right. However, there’s a simple way to get around the limitations of std::scanf–don’t use it. Instead, use std::fgets followed by the string version of std::scanf, the function std::sscanf:
char line[100]; // Line for data
std::fgets(line, sizeof(line), stdin); // Read numbers
std::sscanf(line, "%d %d", &number1, &number2);
Finally, there is a file version of std::scanf, the function std::fscanf. It’s identical to scanf except the first parameter is the file to be read. Again, this function is extremely difficult and should not be used. Use std::fgets and std::sscanf instead.
C-Style Binary I/O
Binary I/O is accomplished through two routines: std::fread and std::fwrite. The syntax for std::fread is:
read_size = std::fread(data_ptr, 1, size, file);
- read_size
- Size of the data that was read. If this is less than size, an end-of-file or error occurred.
- data_ptr
- Pointer to a buffer to receive the data being read.
- 1
- The constant 1. (For the reason behind this constant, see the sidebar.)
- size
- Number of bytes to be read.
- file
- Input file.
Why 1?
For example:
struct {
int width;
int height;
} rectangle;
if (std::fread(<static_cast<char *>&rectangle, 1,
sizeof(rectangle), in_file) != sizeof(rectangle)) {
std::fprintf(stderr, "Unable to read rectangle\n");
exit (8);
}
In this example you are reading in the structure rectangle. The & operator makes the structure into a pointer. The cast static_cast<char *> turns &rectangle into the proper parameter type, and the sizeof operator is used to determine how many bytes to read in as well as to check that the read was successful.
std::fwrite has a calling sequence similar to std::fread:
write_size = std::fwrite(data_ptr, 1, size, file);
No matter what filename you give Example 16-9,
std::fopen
can’t find it. Why?Example 16-9:
fun-file/fun-file.cpp#include <cstdio>
#include <cstdlib>
int main( )
{
char name[100]; /* name of the file to use */
std::FILE *in_file; /* file for input */
std::printf("Name? ");
std::fgets(name, sizeof(name), stdin);
in_file = std::fopen(name, "r");
if (in_file == NULL) {
std::fprintf(stderr, "Could not open file\n");
exit(8);
}
std::printf("File found\n");
std::fclose(in_file);
return (0);
}
C- Versus C++- Style I/O
Both C- and C++- style I/O have their own features and quirks. In this section we’ll discuss some of the differences between these two systems.
Simplicity
Let’s say we want to write a simple checkbook program. We need to print an account statement. We need some code to print each line of the account statement (date, check number, payee, and amount).
In C the print statement looks like:
std::printf("%2d/%2d/%02d %4d: %-40s %f6.2\n",
check.date.month, check.date.day, check.date.year,
check.number, check.payee, check.amount);
In C++ the print statement is:
std::cout << setw(2) << check.date.month << ‘/’ <<
setw(2) << check.date.day << ‘/’ <<
setw(2) << setfill(‘0’) << check.date.year << ‘ ‘ <<
setw(4) << check.number << ‘:’ <<
setw(40) << setiosflags(std::ios::left) <<
check.payee <<
resetiosflags(std::ios::left) << ‘ ‘ <<
setw(6) << setprecision(2) <<
setiosflags(std::ios::fixed) <<
check.amount <<
setw(0) << ‘\n’;
From this example we can clearly see that the C-style I/O is more compact. It is not clear that compact is better. This author prefers the compact style of the C
std::printf
functions, while many others prefer the verbosity of the C++ I/O system. Besides if you’re C++ programmers, you probably should program in C++ and not bring legacy I/O systems into the mix.Although it looks like C is more compact, things are not as obvious as they look. A well-designed
date
class would have its own output operator. Thus we can simplify our C++ code down to:
std::cout << check.date <<
setw(4) << check.number << ‘:’ <<
setw(40) << setiosflags(std::ios::left) <<
check.payee <<
resetiosflags(std::ios::left) << ‘ ‘ <<
setw(6) << setprecision(2) <<
setiosflags(std::ios::fixed) <<
check.amount <<
setw(0) << ‘\n’;
But this assumes that only the date has an output operator. If we designed our check class correctly, it should have one as well. This means that our code now has been simplified down to:
std::cout << check << ‘\n’;
Now this doesn’t mean that complexity has gone away. It’s merely been moved from outside the class to inside it.
This example serves to illustrate one of the key differences between C and C++. In C-style I/O, the information on how to manipulate the data (in this case, how to print it) is contained outside the data itself. In C++ it’s possible to put the manipulation code and the data into a single class.
If we are writing out our checkbook information in only one place, the C version may be simpler and easier to work with. So for simple programs, you may want to consider using C-style I/O. But suppose that we wanted to print out the data to a number of places. If we used C-style I/O, we would have to replicate our format code all over the place or create a small function to do the printing. With C++’s classes, we can keep the printing information in one logical place. (As a person who’s just had to rewrite all the C-style format statements in a rather large piece of code, I can tell you that putting the formatting information in one place, the object, has some advantages.)
Reliability
When you use C++-style I/O, the system automatically detects the type of the variable and performs the approbate conversion. It’s impossible to get the types wrong.
With C-style I/O, it’s easy to get the arguments to a std::printf mixed up, resulting in very strange results when you run the program. What’s worse is that most compilers do not check std::printf calls and warn you if you make a mistake.
One special C I/O function you should be aware of is std::gets. This function gets a line from standard input with no bounds-checking. So:
std::gets(line);
is exactly like:
std::fgets(line, INFINITY, stdin);
If there are too many characters in an input line, the std::gets function will cause a buffer overflow and trash memory. This single function and its lack of bounds-checking has to be responsible for more crashes and security holes than any other single C function.[2] You should never use it. You can get in enough trouble with the more reliable C functions without having to play Russian roulette with this one.
Speed
I’ve done some benchmarks on C and C++ I/O for binary files. In general I’ve found the C I/O to be much faster. That’s because the C I/O system is less flexible and has to deal with less overhead than the C++ system.
TIP:
I’m not talking about formatted I/O, just raw binary I/O. If you do formatted I/O in either system, you can expect your speed to go down tremendously. It’s the single slowest system in the entire C and C++ library.Which Should You Use?
Which I/O system is best? That depends on a large number of factors. First of all, any system you know is always going to be easier to use and more reliable than a system you don’t know.
However, if you know both systems, C-style I/O is good for the simple stuff. If you’re not doing anything fancy with classes and just want to write simple formatted reports, the C I/O system will do the job. However, for larger jobs, the C++-object oriented system with its object-oriented I/O system handles complexity and organizes complex information much better than C-style I/O.
But if you’re learning I/O for the first time, I suggest that you stick with one I/O system, the C++ one. Learn C-style I/O only if you’re forced to. (Say, for instance, you have to maintain some legacy code that uses the old C-style system.)
Programming Exercises
Write a program that reads a file and counts the number of lines in it.
Write a program to copy a file, expanding all tabs to multiple spaces. (For historical reasons–the Teletype again–almost all text files use a tab setting of 8 characters.)
Write a program that reads a file containing a list of numbers and writes two files, one containing all the numbers divisible by 3 and another containing all the other numbers.
Write a program that reads an ASCII file containing a list of numbers and writes a binary file containing the same list. Write a program that goes the other way so you can check your work.
Write a program that copies a file and removes all characters with the high bit set (((ch & 0x80) != 0)).
Design a file format to store a person’s name, address, and other information. Write a program to read this file and produce a file containing a set of mailing labels.
Answers to Chapter Questions
The problem is that you are writing an ASCII file, but you wanted a binary file. In Unix, ASCII is the same as binary, so the program runs fine. In MS-DOS/Windows, the end-of-line issue causes problems. When you write a newline character (0x0a) to the file, a carriage return (0x0D) is added to the file. (Remember that end-of-line in MS-DOS/Windows is <carriage return><line feed>, or 0x0d, 0x0a.) Because of this editing, you get an extra carriage return (0x0d) in the output file.
To write binary data (without output editing) you need to open the file with the binary option:
out_file.open("test.out", std::ios::out | std::ios::binary);
The std::printf call does not check for the correct number of parameters. The statement:
std::printf("The answer is %d\n");
tells the std::printf to print the string "The answer is" followed by the answer. The problem is that the parameter containing the answer was omitted. When this happens, std::printf gets the answer from a random location and prints garbage.
Properly written, the std::printf statement is:
std::printf("The answer is %d\n", answer);
The std::printf call does not check the type of its parameters. You tell std::printf to print an integer number (%d) and supply it with a floating-point parameter (result). This mismatch causes unexpected results, such as printing the wrong answer.
When printing a floating-point number, you need a %f conversion. Properly written, our std::printf statement is:
std::printf("The answer is %f\n", result);
The problem is that std::fgets gets the entire line, including the newline character (\n). If you have a file named sam, the program reads
sam\n
and tries to look for a file by that name. Because there is no such file, the program reports an error.The fix is to strip the newline character from the name:
name[strlen(name) – 1] = ‘\0’; /* Get rid of last character */
The error message in this case is poorly designed. True, you did not open the file, but the programmer could supply the user with more information. Are you trying to open the file for input or output? What is the name of the file you are trying to open? You don’t even know whether the message you are getting is an error, a warning, or just part of the normal operation. A better error message is:
std::fprintf(stderr, "Error: Unable to open %s for input\n", name);
Notice that this message would also help us detect the programming error. When you typed in "sam", the error would be:
Error: Unable to open sam
for input
This clearly shows us that you are trying to open a file with a newline in its name.
1.
If you take a look at the C++ standard, you’ll notice that the formal definition of these functions is somewhat more complex. I’ve simplified the definition for this book, but this definition is compatible with the formal one.2.
As I am writing this, Microsoft has just released a security patch to Windows XP to fix a buffer overflow bug.
Practical C++ Programming, 2nd Edition
By Steve Oualline
http://www.oreilly.com/catalog/cplus2/?CMP=OT16469
O’Reilly & Associates, December 2002
ISBN: 0-596-00419-2