Client-Side Operations with Local Files

Part1. File API

Even the most Web-centric application sometimes requires a limited access to the local file system. An application may read a local file and display its contents in a browser-based text editor, compute cryptographic hashes on files, or process an image before sending it to a remote server. File access techniques in the Web context must be conservative enough to keep the computer safe from malware infection, but at the same time they should satisfy the demands of feature-rich online applications. Before the advent of HTML5 Web developers could only work within the scope of a file-select control.

In 1995 RFC 1867 proposed form-based file upload as an extension to HTML 2.0, but it did not provide any client-side interfaces to handle such uploads. The input element of type file was displayed as a text field and a "Browse" button. When the user clicked the button, an "Open File" system dialog showed up. After the user selected a file, it was to be uploaded to the server among other parts of a form's data. Client-side scripts were not capable of getting the attributes of the file. Neither could they read the file's data locally. In a typical client-server HTTP communication model the file was considered part of the payload (entity) of an HTTP request. File metainformation was formed as a set of entity header fields (Content-Length, Content-Type); the file contents were sent to a server as part of the entity body. Any further operations were presumed to be server-side:

form with <input> control
<form action="http://example.com/cgi-scripts/file-handler.php" method="POST" enctype="multipart/form-data">
 <input type="file" name="file-control">
 <input type="submit" value="submit">
</form>

CGI/FastCGI server-side script
<?php
  $name=$_FILES["file-control"]["name"];
  $type=$_FILES["file-control"]["type"];
  $size=$_FILES["file-control"]["size"];
?>

In the absence of file-related JavaScript interfaces developers confined themselves to proprietary plugins. Java applets or ActionScript classes from flash.net and flash.filesystem namespaces could settle some points of difficulty, but could also endanger safety of the local system. Heavy reliance on external plugins made Web applications rather vulnerable, so there was a real need for new client interfaces without any dependence on third-party extensions. The solution was proposed in W3C Consortium's works. The Web community's collaborative efforts laid the foundations of JavaScript File API which obtained the tacit sanction of major browser vendors.

Main "building blocks" of File API are

  • the <input> element representing a list of selected files; the list itself is an instance of the FileList collection;
  • Blob objects for storing raw binary data;
  • File objects inheriting their methods and properties from the Blob interface;
  • the FileReader interface based on the seamless and event-oriented model for reading files from the underlying system;
  • special object URLs used to refer to Blob and File instances.
File Control

Access to file objects is granted to Web applications through the use of an <input> element, as a result of a drag-and-drop operation, or by means of getting a file from the browser-controlled file storage. The input element of type file falls into the category of interactive content elements, its type attribute is said to be in the File Upload state. To enforce a restrictive set of MIME types permitted for upload, the input element can have the accept attribute. The boolean multiple enables multiple selection of files:

<input type="file" name="file-control" accept="image/*" multiple>

The required attribute makes file selection compulsory: an attempt to submit a form containing an empty file control with the attribute will impel the browser to show a prompt to the user, e. g. Please select a file. Custom prompts can be created by calling validation routines:

<input type="file" name="file-control" accept="image/*" multiple required
 oninvalid="showPrompt(this)" onchange="this.setCustomValidity('');">

When the input element fails to validate, an invalid event is fired. In the example above, the event is handled by a named function that can set a custom error message shown to the user:

function showPrompt(fileControl) {
 fileControl.setCustomValidity('Please select your images before uploading them to the server.');
}

When a file is selected, the change event is dispatched. Passing the empty string to the setCustomValidity() method will clear the custom error.

FileList Collection

After the user selects a file, the name of the file becomes the value of the input element. In some environments the value presents itself as a fakepath (e.g. C:\fakepath\localfile.txt). Multiple selection returns a list of the chosen files. The list is an instance of the FileList collection. Even if a single file is selected, it is considered an item of a FileList with the length property equal to one. We obtain access to FileList programmatically by using the files property of the input element:

file control with general attributes
<input type="file" name="file-control" accept="image/*" multiple onchange="countFiles(this)">

getting the number of selected images
function countFiles(fileControl){
 var fileList=fileControl.files;
 console.log(fileList.length);
}

As any FileList is an indexed sequence, access to each separate file is obtained either by using the file index (var file=fileList[0];) or by calling the item() method of the FileList (var file=fileList.item(0);).

File

Separate items of a FileList are instances of the File interface. The File interface inherits from Blob - Binary Large Object API designed to maintain low-level bytes manipulations. Blob meets the definition of an opaque block of binary data and was once part of Google Gears interfaces. Gears' Blob API had a length property and two methods, getBytes() and slice(). A modern Blob implementation is expected to have the size and type properties as well as the slice() method.

A Blob's size returns its length in bytes. Its type denotes a valid MIME type, e.g. text/plain, image/jpeg, audio/mid, or video/mpeg. Slicing a Blob object means to truncate it and obtain a new instance of Blob of a smaller size. The slice() method is not parameterless: the first parameter points to the start index in the range of Blob's bytes, the second one defines the end point of the method call. The third parameter is a content type prompting a desired MIME type for the new Blob. If no arguments are provided, a binary copy of the previous Blob is created.

Besides interface members inherited from Blob, the File interface has two more attributes bearing metadata for a selected file: these are the name and lastModified properties. In the example below, a list of audio files picked out by the user is parsed sequentially: information about each file is represented as a table row; the file size and modification date are formatted according to the custom options.

input element for picking out audio files
<input type="file" name="file-control" accept="audio/*" multiple onchange="showFilesInfo(this.files)"<

extracting files metainformation
function showFilesInfo(fileList) {
 var table=document.createElement("table");
 table.border="1";
 var options={ // date and time formatting options
  year: "numeric", month: "2-digit", day: "numeric",
  hour: "2-digit", minute: "2-digit", second: "2-digit"
};
 for(var i=0; i<fileList.length; i++) {
  var file=fileList[i];
  var row=table.insertRow(i);
  row.insertCell().textContent=file.name; // both base name and extension
  row.insertCell().textContent=new Intl.NumberFormat(["en-US"]).format(file.size)+" bytes";
  row.insertCell().textContent=file.type; // a valid MIME type detected by the browser
  row.insertCell().textContent=new Intl.DateTimeFormat(["en-US"], options).format(file.lastModifiedDate);
 }
 document.body.appendChild(table);
}

URL for File Reference

The earliest draft of the File API specification mentioned the urn property as a means to associate a File object with a unique URI. That kind of URIs might be used programmatically within the lifetime of a script which invoked URI assignment. However, using UUID URN namespaces in Web applications did not enjoy the full consent of the Web developers, so a new URI scheme for a File object was proposed. The scheme received a designation of the blob: URL. The blob: URL consists of the blob: scheme token, an opaque string formed as a unique value and an optional fragment identifier. Web applications can use the URL to generate new Web page contents dynamically:

input element for selecting an image
<input type="file" name="file-control" accept="image/*" onchange="showImage(this.files[0])">

creating an <img> element dynamically
function showImage(file) {
 var url=URL.createObjectURL(file);
 var img=new Image();
 img.src=url;
 document.body.appendChild(img);
}

The blob: URL is created programmatically with the help of the URL interface. The static createObjectURL() method accepts a Blob as its first argument. The second argument is optional and depends on the actual blob: URL implementation in the browser; for example, Internet Explorer allows the inclusion of the oneTimeOnly option:

var url=URL.createObjectURL(file, {oneTimeOnly: true});

When an object URL is no longer needed, it should be removed by calling the revokeObjectURL() method:

 URL.revokeObjectURL(url);

Object URLs can be used in conjunction with an XMLHttpRequest instance. In this case the browser emulates HTTP message exchange and provides responses based on such URLs with Content-Type and Content-Length headers describing the referenced Blob. Both headers are made available when the readyState property of the request is equal to 2.

Another non-trivial application of blob: URLs is their use in CSS:

selecting a local image to create a user-defined theme for an element
<input type="file" name="file-control" accept="image/*" onchange="fillBackground(this.files[0])">

setting background image of the element
function fillBackground(file) {
 var url=URL.createObjectURL(file);
 document.getElementById("pattern-cells").style.backgroundImage="url("+url+")";
}