A look at the file handling methods that Java provides, and an introduction to the complex I/O system that comes packaged in java.io.
File handling is an integral part of nearly all programming projects. Files provide the means by which a program stores data, accesses stored data, or shares data. As a result, there are very few applications that don’t interact with a file in one form or another. Although no aspect of file handling is particularly difficult, a great many classes, interfaces, and methods are involved. Being able to effectively apply them to your projects is the mark of a professional.
It is important to understand that file I/O is a subset of Java’s overall I/O system. Furthermore, Java’s I/O system is quite large. This is not surprising given that it supports two distinct I/O class hierarchies: one for bytes and one for characters. It contains classes that enable a byte array, a character array, or a string to be used as source or target of I/O operations. It also provides the ability to set or obtain various attributes associated with a file, itself, such as its read/write status, whether the file is a directory, or if it is hidden. You even obtain a list of files within a directory.
Despite is size, Java’s I/O system is surprisingly easy to use. One reason for this is its well-thought-out design. By structuring the I/O system around a carefully crafted set of classes, this very large API is made manageable. Once you understand how to use the core classes, it’s easy to learn its more advanced capabilities. The I/O system’s consistency makes it easy to maintain or adapt code, and its rich functionality provides solutions to most file handling tasks.
The core of Java’s I/O system is packaged in java.io. It has been included with Java since version 1.0, and it contains the classes and interfaces that you will most often use when performing I/O operations, including those that operate on files. Simply put, when you need to read or write files, java.io is the package that you will normally turn to. As a result, all of the recipes in this chapter use its capabilities in one form or another.
Another package that includes file handling classes is java.util.zip. The classes in java.util.zip can create a compressed file, or decompress a file. These classes build on the functionality provided by the I/O classes defined in java.io. Thus, they are integrated in to Java’s overall I/O strategy. Three recipes demonstrate the use of data compression when handling files.
This tutorial provides several recipes that demonstrate file handling. It begins by describing several fundamental operations, such as reading and writing bytes or characters. It then shows various techniques that help you utilize and manage files.
Here are the recipes contained in this chapter:
- Read Bytes from a File
- Write Bytes to a File
- Buffer Byte-Based File I/O
- Read Characters from a File
- Write Characters to a File
- Buffer Character-Based File I/O
- Read and Write Random-Access Files
- Obtain File Attributes
- Set File Attributes
- List a Directory
- Compress and Decompress Data
- Create a ZIP file
- Decompress a ZIP file
- Serialize Objects
Beginning with version 1.4, Java began providing an additional approach to I/O called NIO (which stands for New I/O). It creates a channel-based approach to I/O and is packaged in java.nio. The NIO system is not intended to replace the stream-based I/O classes found in java.io. Instead, NIO supplements them. Because the focus of this chapter is stream-based I/O, no NIO-based recipes are included. The interested reader will find a discussion of NIO (and of I/O in general) in my book Java: The Complete Reference.
An Overview of File Handling
In Java, file handling is simply a special case aspect of a larger concept because file I/O is tightly integrated into Java’s overall I/O system. In general, if you understand one part of the I/O system, it’s easy to apply that knowledge to another situation. There are two aspects of the I/O system that make this feature possible. The first is that Java’s I/O system is build on a cohesive set of class hierarchies, at the top of which are abstract classes that define much of the basic functionality shared by all specific concrete subclasses. The second is the stream. The stream ties together the file system because all I/O operations occur through one. Because of the importance of the stream, we will begin this overview of Java’s file handling capabilities there.
Streams
A stream is an abstraction that either produces or consumes information. A stream is linked to a physical device by the I/O system. All streams behave in the same manner, even if the actual physical devices they are linked to differ. Thus, the same I/O classes and methods can be applied to different types of devices. For example, the same methods that you use to write to the console can also be used to write to a disk file or to a network connection. The core Java streams are implemented within class hierarchies defined in the java.io package. These are the streams that you will usually use when handling files. However, some other packages also define streams. For example, java.util.zip supplies streams that create and operate on compressed data.
Modern versions of Java define two types of streams: byte and character. (The original 1.0 version of Java defined only byte streams, but character streams were quickly added.) Byte streams provide a convenient means for handling input and output of bytes. They are used, for example, when reading or writing binary data. They are especially helpful when working with files. Character streams are designed for handling the input and output of characters, which streamlines internationalization.
The fact that Java defines two different types of streams makes the I/O system quite large because two separate class hierarchies (one for bytes, one for characters) are needed. The sheer number of I/O classes can make the I/O system appear more intimidating that it actually is. For the most part, the functionality of byte streams is paralleled by that of the character streams.
One other point: at the lowest level, all I/O is still byte-oriented. The character-based streams simply provide a convenient and efficient means for handling characters.
The Byte Stream Classes
Byte streams are defined by two class hierarchies: one for input and one for output. At the top of these are two abstract classes: InputStream and OutputStream. InputStream defines the characteristics common to byte input streams, and OutputStream describes the behavior of byte output streams. The methods specified by InputStream and OutputStream are shown in Tables 3-1 and 3-2. From InputStream and OutputStream are created several subclasses, which offer varying functionality. These classes are shown in Table 3-3.
Of the byte-stream classes, two are directly related to files: FileInputStream and FileOutputStream. Because these are concrete implementations of InputStream and OutputStream, they can be used any place an InputStream or an OutputStream is needed. For example, an instance of FileInputStream can be wrapped in another byte stream class, such as a BufferedInputStream. This is one reason why Java’s stream-based approach to I/O is so powerful: it enables the creation of a fully integrated class hierarchy.
The Character Stream Classes
Character streams are defined by using class hierarchies that are different from the byte streams. The character stream hierarchies are topped by these two abstract classes: Reader andWriter. Reader is used for input, and Writer is used for output. Tables 3-4 and 3-5 show the methods defined by these classes. Concrete classes derived from Reader and Writer operate on Unicode character streams. In general, the character-based classes parallel the byte-based classes. The character stream classes are shown in Table 3-6.
Of the character-stream classes, two are directly related to files: FileReader and FileWriter. Because these are concrete implementations of Reader and Writer, they can be used any place a Reader or Writer is needed. For example, an instance of FileReader can be wrapped in a BufferedReader to buffer input operations.
Table 3.1 The Methods Defiend by InputStream
Method | Description |
int available( ) throws IOException | Returns the number of bytes of input currently available for reading. |
void close( ) throws IOException | Closes the input source. |
void mark(int numBytes) | Places a mark at the current point in the input stream that will remain valid until numBytes bytes are read. Not all streams implement mark( ). |
boolean markSupported( ) | Returns true if mark( )/reset( ) are supported by the invoking stream. |
abstract int read( ) throws IOException | Returns an integer representation of the next available byte of input. –1 is returned when the end of the file is encountered. |
int read(byte buffer[ ]) throws IOException | Attempts to read up to buffer.length bytes into buffer and returns the actual number of bytes that were successfully read. –1 is returned when the end of the file is encountered. |
int read(byte buffer[ ], int offset,int numBytes) throws IOException | Attempts to read up to numBytes bytes into buffer starting at buffer[offset], returning the number of bytes successfully read. –1 is returned when the end of the file is encountered. |
void reset( ) throws IOException | Resets the input pointer to the previously set mark. Not all streams support reset( ). |
long skip(long numBytes) throws IOException | Ignores (that is, skips) numBytes bytes of input, returning the number of bytes actually ignored. |
Table 3.2 The Methods Specified by OutputStream
Method | Description |
void close( ) throws IOException | Closes the output stream. |
void flush( ) throws IOException | Finalizes the output state so that any buffers are cleared. That is, it flushes the output buffers. |
abstract void write(int b) throws IOException | Writes the low-order byte of b to the output stream. |
void write(byte buffer[ ]) throws IOException | Writes a complete array of bytes to the output stream. |
void write(byte buffer[ ], int offset, int numBytes) throws IOException | Writes a subrange of numBytes bytes from the array buffer, beginning at buffer[offset]. |
Table 3.3 The Byte Stream Classes
Byte Stream Class | Description |
BufferedInputStream | Buffered input stream. |
BufferedOutputStream | Buffered output stream. |
ByteArrayInputStream | Input stream that reads from a byte array. |
ByteArrayOutputStream | Output stream that writes to a byte array. |
DataInputStream | An input stream that contains methods for reading Java’s standard data types. |
DataOutputStream | An output stream that contains methods for writing Java’s standard data types. |
FileInputStream | Input stream that reads from a file. |
FileOutputStream | Output stream that writes to a file. |
FilterInputStream | Implements InputStream and allows the contents of another stream to be altered (filtered). |
FilterOutputStream | Implements OutputStream and allows the contents of another stream to be altered (filtered). |
InputStream | Abstract class that describes stream input. |
OutputStream | Abstract class that describes stream output. |
PipedInputStream | Input pipe. |
PipedOutputStream | Output pipe. |
PrintStream | Output stream that contains print( ) and println( ). |
PushbackInputStream | Input stream that allows bytes to be returned to the stream. |
RandomAccessFile | Supports random-access file I/O. |
SequenceInputStream | Input stream that is a combination of two or more input streams that will be read sequentially, one after the other. |
A superclass of FileReader is InputStreamReader. It translates bytes into characters. A superclass of FileWriter is OutputStreamWriter. It translates characters into bytes. These classes are necessary because all files are, at their foundation, byte-oriented.
The RandomAccessFile Class
The stream classes just described operate on files in a strictly sequential fashion. However, Java also allows you to access the contents of a file in non-sequential order. To do this, you will use RandomAccessFile, which encapsulates a random-access file. RandomAccessFile is not derived from InputStream or OutputStream. Instead, it implements the interfaces DataInput and DataOutput (which are described shortly). RandomAccessFile supports random access because it lets you change the location in the file at which the next read or write operation will occur. This is done by calling its seek( ) method.
Table 3.4 The Methods Defined by Reader
Method | Description |
abstract void close( ) throws IOException | Closes the input source. |
void mark(int numChars) throws IOException | Places a mark at the current point in the input stream that will remain valid until numChars characters are read. Not all streams support mark( ). |
boolean markSupported( ) | Returns true if mark( )/reset( ) are supported on this stream. |
int read( ) throws IOException | Returns an integer representation of the next available character from the input stream. –1 is returned when the end of the file is encountered. |
int read(char buffer[ ]) throws IOException | Attempts to read up to buffer.length characters into buffer and returns the actual number of characters that were successfully read. –1 is returned when the end of the file is encountered. |
abstract int read(char buffer[ ], int offset, int numChars) throws IOException | Attempts to read up to numChars characters into buffer starting at buffer[offset], returning the number of characters successfully read. –1 is returned when the end of the file is encountered. |
boolean ready( ) throws IOException | Returns true if input is pending. Otherwise, it returns false. |
void reset( ) throws IOException | Resets the input pointer to the previously set mark. Not all streams support reset( ). |
long skip(long numChars) throws IOException | Skips over numChars characters of input, returning the number of characters actually skipped. |
The File Class
In addition to the classes that support file I/O, Java provides the File class, which encapsulates information about a file. This class is extremely useful when manipulating a file itself (rather
than its contents) or the file system of the computer. For example, using File you can determine if a file is hidden, set a file’s date, set a file to read-only, list the contents of a directory, or create a new directory, among many other things. Thus, File puts the file system under your control. This makes File one of the most important classes in Java’s I/O system.
Table 3.5 The Methods Defined By Writer
Method | Description |
Writer append(char ch) throws IOException | Appends ch to the end of the invoking output stream. Returns a reference to the stream. |
Writer append(CharSequence chars) throws IOException | Appends chars to the end of the invoking output stream. Returns a reference to the stream. |
Writer append(CharSequence chars, int begin, int end) throws IOException | Appends a subrange of chars, specified by begin and end, to the end of the output stream. Returns a reference to the stream. |
abstract void close( ) throws IOException | Closes the output stream. |
abstract void flush( ) throws IOException | Finalizes the output state so that any buffers are cleared. That is, it flushes the output buffers. |
void write(int ch) throws IOException | Writes the character in the low-order 16 bits of ch to the output stream. |
void write(char buffer[ ]) throws IOException | Writes a complete array of characters to the output stream. |
abstract void write(char buffer[ ], int offset, int numChars) throws IOException | Writes a subrange of numChars characters from the array buffer, beginning at buffer[offset] to the output stream. |
void write(String str) throws IOException | Writes str to the output stream. |
void write(String str, int offset, int numChars) | Writes a subrange of numChars characters from the string str, beginning at the specified offset. |
The I/O Interfaces
Java’s I/O system includes the following interfaces (which are packaged in java.io):
Closeable | DataInput | DataOutput |
Externalizable | FileFilter | FilenameFilter |
Flushable | ObjectInput | ObjectInputValidation |
ObjectOutput | ObjectStreamConstants | Serializable |
Those used either directly or indirectly by the recipes in this chapter are DataInput, DataOutput, Closeable, Flushable, FileFilter, FilenameFilter, ObjectInput, and ObjectOutput.
The DataInput and DataOutput interfaces define a variety of read and write methods, such as readInt( ) and writeDouble( ), that can read and write Java’s primitive data types. They also specify read( ) and write( ) methods that parallel those specified by InputStream and OutputStream. All operations are byte-oriented. RandomAccessFile implements the DataInput and the DataOutput interfaces. Thus, random-access file operations in Java are byte-oriented.
Table 3.6 The Character Stream Classes
Method | Description |
BufferedReader | Buffered input character stream. |
BufferedWriter | Buffered output character stream. |
CharArrayReader | Input stream that reads from a character array. |
CharArrayWriter | Output stream that writes to a character array. |
FileReader | Input stream that reads from a file. |
FileWriter | Output stream that writes to a file. |
FilterReader | Filtered reader. |
FilterWriter | Filtered writer. |
InputStreamReader | Input stream that translates bytes to characters. |
LineNumberReader | Input stream that counts lines. |
OutputStreamWriter | Output stream that translates characters to bytes. |
PipedReader | Input pipe. |
PipedWriter | Output pipe. |
PrintWriter | Output stream that contains print( ) and println( ). |
PushbackReader | Input stream that allows characters to be returned to the input stream. |
Reader | Abstract class that describes character stream input. |
StringReader | Input stream that reads from a string. |
StringWriter | Output stream that writes to a string. |
Writer | Abstract class that describes character stream output. |
The Closeable and Flushable interfaces are implemented by several of the I/O classes. They provide a uniform way of specifying that a stream can be closed or flushed. The Closeable interface defines only one method, close( ), which is shown here:
void close( ) throws IOException
This method closes an open stream. Once closed, the stream cannot be used again. All I/O classes that open a stream implement Closeable.
The Flushable interface also specifies only one method, flush( ), which is shown here:
void flush( ) throws IOException
Calling flush( ) causes any buffered output to be physically written to the underlying device. This interface is implemented by the I/O classes that write to a stream.
FileFilter and FilenameFilter are used to filter directory listings.
The ObjectInput and ObjectOutput interfaces are used when serializing (saving and restoring) objects.
The Compressed File Streams
In java.util.zip, Java provides a very powerful set of specialized file streams that handle the compression and decompression of data. All are subclasses of either InputStream or OutputStream, described earlier. The compressed file streams are shown here:
DeflaterInputStream | Reads data, compressing the data in the process. |
DeflaterOutputStream | Writes data, compressing the data in the process. |
GZIPInputStream | Reads a GZIP file. |
GZIPOutputStream | Writes a GZIP file. |
InflaterInputStream | Reads data, decompressing the data in the process. |
InflaterOutputStream | Writes data, decompressing the data in the process. |
ZipInputStream | Reads a ZIP file. |
ZipOutputStream | Writes a ZIP file. |
Using the compressed file streams, it is possible to automatically compress data while writing to a file or to automatically decompress data when reading from a file. You can also create compressed files that are compatible with the standard ZIP or GZIP formats, and you can decompress files in those formats. The actual compression is provided by the Inflater and Deflater classes, also packaged in java.util.zip. They use the ZLIB compression library. You won’t usually need to deal with these classes directly when compressing or decompressing files because their default operation is sufficient.
Tips For Handling Errors
File I/O poses a special challenge when it comes to error handling. There are two reasons for this. First, I/O failures are a very real possibility when reading or writing files. Despite the fact that computer hardware (and the Internet) is much more reliable than in the past, it still fails at a fairly high rate, and any such failure must be handled in a manner consistent with the needs of your application. The second reason that error handling presents a challenge when working with files is that nearly all file operations can generate one or more exceptions. This means that nearly all file handling code must take place within a try block.
The most common I/O exception is IOException. This exception can be thrown by many of the constructors and methods in the I/O system. As a general rule, it is generated when something goes wrong when reading or writing data, or when opening a file. Other common I/O-related exceptions, such as FileNotFoundException and ZipException, are subclasses of IOException.
There is another common exception related to file handling: SecurityException. Many constructors or methods will throw a SecurityException if the invoking application does not have permission to access a file or perform a specific operation. You will need to handlethis exception in a manner appropriate to your application. For simplicity, the examples in this chapter do not handle security exceptions, but it may be necessary for your applications to do so.
Because so many constructors and methods can generate an IOException, it is not uncommon to see code that simply wraps all I/O operations within a single try block and then catches any IOException that may occur. While adequate for experimenting with file I/O or possibly for simple utility programs that are for your own personal use, this approach is not usually suitable for commercial code. This is because it does not let you easily deal individually with each potential error. Instead, for detailed control, it is better to put each operation within its own try block. This way, you can precisely report and respond to the error that occurred. This is the approach demonstrated by the examples in this chapter.
Another way that IOExceptions are sometimes handled is by throwing them out of the method in which they occur. To do this you must include a throws IOException clause in the method’s declaration. This approach is fine in some cases, because it reports an I/O failure back to the caller. However, in other situations it is a dissatisfying shortcut because it causes all users of the method to handle the exception. The examples in this chapter do not use this approach. Rather, they handle all IOExceptions explicitly. This allows each error handler to report precisely the error that occurred.
If you do handle IOExceptions by throwing them out of the method in which they occur, you must take extra care to close any files that have been opened by the method. The easiest way to do this is to wrap your method’s code in a try block and then use a finally clause to close the files(s) prior to the method returning.
In the examples in this chapter, any I/O exceptions that do occur are handled by simply displaying a message. While this approach is acceptable for the example programs, real applications will usually need to provide a more sophisticated response to an I/O error. For example, you might want to give the user the ability to retry the operation, specify an alternative operation, or otherwise gracefully handle the problem. Preventing the loss or corruption of data is a primary goal. Part of being a great programmer is knowing how to effectively manage the things that might go wrong when an I/O operation fails.
One final point: a common mistake that occurs when handling files is forgetting to close a file when you are done with it. Open files use system resources. Thus, there are limits to the
number of files that can be open at any one time. Closing a file also ensures that any data written to the file is actually written to the physical device. Therefore, the rule is very simple: if you open a file, close the file. Although files are typically closed automatically when an application ends, it’s best not to rely on this because it can lead to sloppy programming and bad habits. It is better to explicitly close each file, properly handling any exceptions that might occur. For this reason, all files are explicitly closed by the examples in this chapter, even when the program is ending.