File System Implementation

File system implementation defines how files and directories are stored, how disk space is managed, and how to make everything work efficiently and reliably.

File-System Layout

File Systems are stored on disks. The above figure depicts a possible File-System Layout.

MBR: Master Boot Record is used to boot the computer
Partition Table: Partition table is present at the end of MBR. This table gives the starting and ending addresses of each partition.
Boot Block: When the computer is booted, the BIOS reads in and executes the MBR. The first thing the MBR program does is locate the active partition, read in its first block, which is called the boot block, and execute it. The program in the boot block loads the operating system contained in that partition. Every partition contains a boot block at the beginning though it does not contain a bootable operating system.
Super Block: It contains all the key parameters about the file system and is read into memory when the computer is booted or the file system is first touched.

Implementing Files

Contiguous Allocation:
Each file is stored as a contiguous run of disk blocks.
Example: On a disk with 1KB blocks, a 50KB file would be allocated 50 consecutive blocks. With 2KB blocks it would be 25 consecutive blocks.
Each file begins at the start of a new block, so that if file A is occupying 3½ blocks, some space is wasted at the end of the last block.

Advantages:
Simple to implement.
The read performance is excellent because the entire file can be read from the disk in a single operation.

Drawbacks:
Over the course of time the disk becomes fragmented.
Linked List Allocation:
The second method for storing files is to keep each one as a linked list of disk blocks. The first word of each block is used as a pointer to the next one. The rest of the block is for data. Unlike Contiguous allocation no space is lost in disk fragmentation.

Random access of a file is very slow.
Linked-List Allocation Using a Table in Memory:
The disadvantage of linked list can be overcome by taking the pointer word from each disk block and putting it in a table in memory. Such a table in main memory is called a FAT (File Allocation Table). Using FAT random access can be made much easier.
The primary disadvantage of this method is that the entire table must be in memory all the time to make it work.
I-nodes: I-node is a data structure which is used to identify which block belongs to which file. It contains the attributes and disk addresses of the file's blocks. Unlike the in-memory table the i-node need to be in memory only when the corresponding file is open.

Implementing Directories

The main function of the directory system is to map the ASCII name of the file onto the information needed to locate the data. A directory can be designed in two ways.

In a simple design a directory consists of a list of fixed-size entries,one per file, containing a (fixed-length) file name, a structure of the file attributes, and one or more disk addresses telling where the disk blocks are.
For systems that use i-nodes, another possibility for storing the attributes is in the i-nodes, rather than in the directory entries. In that case, the directory entry can be shorter: just a file name and an i-node number.

Shared Files

When several users are working together sharing of files takes place. In such a case it is convenient for a file to appear simultaneously in different directories belonging to different users. Sharing files is convenient but also introduces problems like if the original file or the shared file are appended with new features then it will not be updated in both the copies of the file, it is updated only at its original location. This way the purpose of file sharing is defeated. This problem can be solved by two ways:

i-node
Symbolic linking

Log-Structured File Systems

The idea that drove the LFS design is that as CPU's and RAM's are getting faster and larger in the future, most disk accesses will be writes, so the read-ahead mechanism used in some file systems to fetch blocks before they are needed no longer gains much performance.
Using LFS all writes are initially buffered in memory, and periodically all the buffered writes are written to the disk in a single segment, at the end of the log. Opening a file now consists of using the map to locate the i-node for the file. Once the i-node has been located, the addresses of the blocks can be found from it. All of the blocks will themselves be in segments, somewhere in the log.
Since eventually the log will occupy the entire disk, at which time no new segments can be written to the log. To deal with this problem, LFS has a cleaner thread that spends its time scanning the log circularly to compact it. In this the disk is a big circular buffer, with the writer thread adding new segments to the front and the cleaner thread removing old ones from the back.

Journaling File Systems

The basic idea here is to keep a log of what the file system is going to do before it does it, so that if the system crashes before it can do its planned work, upon rebooting the system can look in the log to see what was going on at the time of the crash and finish the job. Such file systems, called journaling file systems, are actually in use.
Operations that take place for removing a file:

Remove the file from its directory
Release the i-node to the pool of free i-nodes
Return all the disk blocks to the pool of free disk blocks

The functioning of Journaling file system is such that it first writes a log entry listing the three actions to be completed. The log entry is then written to disk. Only after the log entry has been written, the various operations begin. After the operations are completed successfully, the log entry is erased. If the system now crashes upon recovery the file system can check the log to see if any operations were pending. If so, all of them can be rerun (multiple times in the event of repeated crashes) until the file is correctly removed.
Crash recovery can be made fast and secure when log operations are idempotent.

Virtual File Systems

An operating system can have multiple file systems in it. Virtual File Systems are used to integrate multiple file systems into an orderly structure. The key idea is to abstract out that part of the file system that is common to all file systems and put that code in a separate layer that calls the underlying concrete file system to actually manage the data.

Structure of Virtual File Systems in UNIX system:

The VFS also has a 'lower' interface to the concrete file systems, which is labeled VFS interface. This interface consists of several dozen function calls that the VFS can make to each file system to get work done. VFS has two distinct interfaces: the upper one to the user processes and the lower one to the concrete file systems
VFS supports remote file systems using the NFS (Network File System) protocol.