CS102: Stream & Files

Copyright © 1999, Kenneth J. Goldman

Persistent Storage

So far, we've treated computer software as something that is executed, "lives" for some period of time, and then terminates. The state of the program has been transient, meaning that the information (data) of the application has existed only for the duration of execution and then goes away.

For applications like a calculator or a computer game, transient state may suffice, but most computer applications would be essentially useless if you couldn't exit the program and then later resume where you left off.

For example,

In other words, we would like to make (at least some of) the state of an application persist even after the program exits, and we want to be able to use that state in a future execution of the program.

Therefore, it is essential that any computer system provide some means of persistent storage -- data that survives program termination, crashes and power outages.

Persistent data is generally kept on magnetic or similar devices. A disk has:

How it works (a simplified view):

  1. When disk I/O (input or output) is requested, the requesting thread is suspended and the operating system calls some disk driver software to activate the motors in the disk drive to position the head at the appropriate track. This is called a head seek.
  2. The driver waits for the appropriate sector to spin under the head and then reads or writes the data. A driver may read after writing to make sure there aren't errors.
  3. In the case of a read, the resulting data is stored into memory.
  4. The operating system is notified so that the requesting thread can be resumed.

This is all fine, except that it would be very awkward if our application saved and loaded data by specifying particular blocks (tracks and sectors) into which data would be saved. Why?

Therefore, it is necessary to have some kind of abstraction of persistent storage that insulates the user and application from these low level details. Typically, this abstraction is provided by the operating system as a file system consisted of named files and directories (folders).

How the OS Stores Files and Directories

Typically, operating systems organize the disk by partitioning it into regions for different purposes. One partition is generally reserved for the directory structure, which is a pointer-based data structure stored on the the disk. Here, the pointers are disk block addresses (instead of memory addresses), but the principle is the same. The structure in most operating systems is modeled after the inode structure used by UNIX. Think of a directory structure as a tree.

The File System Abstraction

When users and applications programmers think about files, we don't think about blocks, and we don't think about inodes. Instead, all of that is hidden from us by an abstraction barrier that includes:

User Interface:

Under the covers, the operating system manages the inodes and the disk to provide the illusions of a nice hierarchical file system to users and programmers

Programmer API:

To make code portable across different operating systems, JAVA provides the package java.io, which is a general API for a file system. Underneath, java.io is implemented on each operating system using the file system API provided by that OS. We'll learn about the java.io package. The basic functionality provided by operating systems is similar, but generally has fewer features.

The goal when working with files is usually to

  1. open a file (possibly creating it)
  2. read and/or write data from/to the file
  3. close the file

Reading and writing files can be accomplished by either:

We'll start with an example of sequential access for a file, using the class DataInputStream and DataOutputStream.

voidFileExample(String filename, int someData, String myString) throws IOException {
File f = new File(filename); // creates a file object but doesn't actually create a file on disk
OutputStream out = new FileOutputStream(f);
DataOutputStream dataOut = new DataOutputStream(out);

void readFileExample(String filename) throws IOException {
File f = new File(filename);
InputStream in = new FileInputStream(f);
DataInputStream dataIn = new DataInputStream(in);
int someData = dataIn.readInt();
String myString = dataIn.readChars();
System.out.println("Read data: " + someData + " " + my String);


DataOutputStream dataOut = new DataOutputStream(new FileOutputStream(filename));
DataInputStream dataIn = new DataInputStream(new FileInputStream(filename));

Serializable Objects

DataInputStream and DataOutputStream are fine when you only want to save primitive data to persistent storage, but what if you want to save an object for a class you have defined?

In that case, you can use ObjectInputStream and ObjectOutputStream, but the objects you plan to save must implement the Serializable interface. For example,

public class Foo implements Serializable {

int x,y;
String myString;
Vector v;

public Foo(int x, int y, String myString) {
this.x = x;
this.y = y;
this.myString = myString;
v = new Vector();

public void insert(Object obj) {

Oddly enough, the Serializable interface contains no methods -- it simply indicates that the programmer wishes objects of this type to be able to be written into and read from streams. Data that is not to be saved can be marked transient.

Saving and loading a serializable objects using files.

Foo f = new Foo(3,4,"testing");
String filename = "testfile.data";
ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(filename));

All of the data in the instance variable of f will be saved to the file. This happens recursively, so all objects referenced by f's instance variables are also saved. All objects must be serializable. For example, if myObject is not serializable, an exception would be thrown in writeObject. (The algorithm does detect cycles.)

At a later time, we could retrieve the data as follows:

ObjectInputStream ois = new ObjectInputStream(new FileInputStream("testfile.data"));
Foo g = (Foo) ois.readObject();

Now the variable g will refer to a fully initialized instance of Foo with all the data. This is a convenient way to make objects persistent. Note that if you modify the class Foo and try to read the data saved from the old version, an exception will occur unless their static final long serialVersion VID values are the same.