Web Forms

Form Data Set and HTTP Requests


Traditional data exchange between a Web application and a remote server is based on the concept of the Web form. Any modern application dealing with online logins, user profiles, or file uploads has to perform such common tasks as collecting user-supplied data, displaying the data through a set of visual controls and submitting the data items to a server process. The WebSocket protocol is a preferred way of maintaining bidirectional communications between a client and a remote host. However, forms remain the main instrument of data submission that does not require multiple connections. Web frameworks usually provide developers with special high-level APIs for working with forms. The low-level approach presumes the knowledge of HTTP-related techniques.

Forms in HTML Documents

An HTML form contains special elements called controls. User modifies the controls before submitting the form to a processing agent. The agent is a program that receives the submitted form data and handles controls' names and values. The following form attributes define how forms interact with their processing agents:

  • accept-charset: it lists character encodings for input data;
  • action: the attribute refers to a URI where the processing agent resides; if the attribute is absent, the form data set is submitted to the current document; the javascript: resource identifier scheme is permitted, but its use does not result in generating an HTTP request;
  • enctype: this attribute specifies the content type used to encode the form data; possible values are application/x-www-form-urlencoded (default), multipart/form-data and text/plain;
  • method: this is the GET or POST HTTP method which is employed to build an HTTP request carrying form data to a server.

A simple form processed entirely on the client side could look like this:

<form action="javascript:handler()">
 <input type="text" name="text-control">
 <input type="submit">
</form>
. . .
<script>
function handler() {
 var form=document.forms[0];
 console.log(form.action); // returns javascript:handler() URI
 console.log(form.method.toUpperCase()); // returns the default GET method
 console.log(form.enctype); // application/x-www-form-urlencoded
 var control=form.elements["text-control"];
 console.log(control.value);
}
</script>

Web Forms: GET Method

The central notion of Web communications is message - the basic unit of the application-level data exchange. Both HTTP request and response are messages. The information transferred as the payload of a message is called entity. A multipart entity can be built from a set of smaller entities.

An ordinary GET request consists of the request line, general headers and request headers:

request line: HTTP method, URI of the requested Web resource and the protocol version
GET / HTTP/1.1
general headers
Connection: keep-alive
. . .
request headers: additional information about the request and the client that has generated it
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
. . .

If a Web page contains a form with the GET method and a server-side URI as the form's action, then submitting the form data will pass through a number of stages. First a sequence of the form controls' name/value pairs is transformed into a form data set. The set is encoded according to the value of the enctype attribute. The final step is the creation of the request URI: the value of the action attribute is combined with the ? sign followed by the form data set. The example below demonstrates an application sending the name of an author and a title of the author's book to get the date of the book publication. The application accepts user-supplied form data and submits the generated data set to the server. The encoding of the form data is application/x-www-form-urlencoded, so if user enters 'Daphne du Maurier' as author and 'The Flight of the Falcon' as title, the resulting URI will be built as http://example.com/request-handler.php?author=Daphne+du+Maurier&title=The+Flight+of+the+Falcon. The server-side processing agent uses PHP superglobal $_GET to obtain the submitted values:

<form action="http://example.com/request-handler.php">
 <input type="text" name="author">
 <input type="text" name="title">
 <input type="submit">
</form>

<?php
 echo "<i>".$_GET["title"]."</i> by ".$_GET["author"]." was published in ".$publication;
?>

The variable $publication may be a result of a server-side search operation based on the submitted form data set. The output for example values as follows:

The Flight of the Falcon by Daphne du Maurier was published in 1965

In CGI / FastCGI scenarios the form data set is handled as the CGI variable QUERY_STRING:

<?php
 echo $_SERVER["QUERY_STRING"]; // returns author=Daphne+du+Maurier&title=The+Flight+of+the+Falcon
?>

Java application servers expose a number of APIs to work with query strings, too: for example, an instance of an HttpServletRequest can use getQueryString() and getParameterNames() methods to obtain submitted form data:

<%
 /* JSP code snippet */
 . . .
 // form data as encoded query string
 out.println(request.getQueryString());
 . . .
 // form data as request parameters
 Enumeration<String> parameterNames=request.getParameterNames();
 while(parameterNames.hasMoreElements()) {
  String parameterName=parameterNames.nextElement();
  String parameterValue=request.getParameter(parameterName);
  out.println(parameterName+"="+parameterValue);
  out.println("<br>");
 }
%>

Web Forms: POST Method

The same example with the POST method can have the following client and server code:

<form action="http://example.com/request-handler.php" method="post">
 <input type="text" name="author">
 <input type="text" name="title">
 <input type="submit">
</form>

<?php
 echo "<i>".$_POST["title"]."</i> by ".$_POST["author"]." was published in ".$publication;
?>

If the method of a form is POST, the structure of the HTTP request is changed: the form data set is sent to the processing agent as a payload of the HTTP request, so both entity headers and entity body are present. The request URI remains unchanged. If user has entered 'Jamaica Inn' as the value of the title control, then the server will receive the following request:

request line
general headers
request headers
Content-Type: application/x-www-form-urlencoded entity header
Content-Length: 42 entity header

author=Daphne+du+Maurier&title=Jamaica+Inn encoded entity body

In Java server-side code the values of the entity headers can be retrieved by calling the getContentType() and getContentLength() methods:

<%
 /* JSP code snippet */
 . . .
 out.println(request.getContentType()); // returns application/x-www-form-urlencoded
 . . .
 out.println(request.getContentLength()); // returns 42
%>

Complex Form Data

If a form is intended for file submission, the encoding type of the form should be set to multipart/form-data. In this case an entity included in the HTTP request is represented as a multipart entity. Its Content-Type header field requires the boundary parameter - a token delimiting body parts of the entity. In the example below, user uploads three files to a library server: the first file is a book in PDF format, the second is the book's metadata described in XML, the third file is an audio version of the book. In addition, user supplies the title of the book as a value of the text control:

<form method="post" action="http://example.com/webapp/book-keeper.jsp" enctype="multipart/form-data">
 <input type="text" name="title">
 <input type="file" name="book">
 <input type="file" name="metadata">
 <input type="file" name="audio-version">
 <input type="submit">
</form>

The HTTP request bearing multipart entity has the following structure:

request line
general headers
request headers
Content-Type: multipart/form-data; boundary=-----------------------------119241944829011 entity header
Content-Length: 537617 entity header

-----------------------------119241944829011 entity part
Content-Disposition: form-data; name="title" entity part header

Frenchman's Creek entity part body
-----------------------------119241944829011 entity part
Content-Disposition: form-data; name="book"; filename="book.pdf" entity part header
Content-Type: application/pdf entity part header

. . . PDF file contents . . . entity part body
-----------------------------119241944829011 entity part
Content-Disposition: form-data; name="metadata"; filename="metadata.xml" entity part header
Content-Type: text/xml entity part header

. . . XML file contents . . . entity part body
-----------------------------119241944829011 entity part
Content-Disposition: form-data; name="audio-version"; filename="audio-book.mp3" entity part header
Content-Type: audio/mpeg entity part header

. . . media file contents . . . entity part body
-----------------------------119241944829011--

To get a programmatic access to the constituent parts of the complex entity in a Java Web application, the getParts() method of HttpServletRequest can be invoked:

<%
 /* JSP code snippet */
 . . .
 out.println("HTTP request contains multipart entity:");
 out.println("<br>");
 Collection<Part> parts=request.getParts();
 Iterator<Part> partsIterator=parts.iterator();
 while(partsIterator.hasNext()) {
  Part part=partsIterator.next();
  out.println("name of the part: "+part.getName());
  out.println("<br>");
  . . .
 }
%>

The getName() method corresponds to the name parameter of the Content-Disposition header. Other characteristics of entity parts can be retrieved by calling getContentType(), getSize() and getHeaderNames() methods:

out.println("content type: "+part.getContentType()); // returns null for title text control
. . .
out.println("part size: "+String.valueOf(part.getSize())); // size of entity part in bytes
. . .
Collection<String> headerNames=part.getHeaderNames();
Iterator<String> headersIterator=headerNames.iterator();
while(headersIterator.hasNext()) {
 String headerName=headersIterator.next();
 String headerValue=part.getHeader(headerName);
 out.println(headerName+": "+headerValue); out.println("<br>");
}

Uploaded files can be saved on the server by calling the convenience write(fileName) method. The file is saved relative to the location specified in the servlet configuration. A more flexible control over uploaded files requires the use of I/O streams:

BufferedInputStream partStream=new BufferedInputStream(part.getInputStream());
BufferedOutputStream outputStream=new BufferedOutputStream(new FileOutputStream(targetPath+File.separator+fileName));
int i; byte[] buffer=new byte[8192];
while((i=partStream.read(buffer))!=-1) {
 outputStream.write(buffer, 0, i);
}
. . .

MIME Message Handling

One of alternative ways to handle multipart entities is the use of APIs for working with MIME messages, e. g., JavaMail API. A multipart entity of an HTTP request can be saved on the server and then utilized as a data source for JavaMail functionality:

ByteArrayDataSource dataSource=new ByteArrayDataSource(request.getInputStream(), "multipart/form-data");
BufferedInputStream inputStream=new BufferedInputStream(dataSource.getInputStream());
int i; byte[] buffer=new byte[8192];
BufferedOutputStream formData=new BufferedOutputStream(new FileOutputStream(targetPath+File.separator+"message.dat"));
while((i=inputStream.read(buffer))!=-1) {
 formData.write(buffer, 0, i);
}
. . .

In the example above, the entity of the HTTP request is retrieved via ServletInputStream. The stream and the MIME type of the entity are fed to the constructor of javax.mail.util.ByteArrayDataSource, then the form data set is saved as a MIME multipart message somewhere on the server. To analyze parts of the message, the file is used as a data source of a javax.mail.internet.MimeMultipart instance:

FileDataSource fileDataSource=new FileDataSource(new File("message.dat"));
MimeMultipart mimeMultipart=new MimeMultipart(fileDataSource);
out.println("Multipart entity has "+String.valueOf(mimeMultipart.getCount())+" parts");

Each body part is accessed via its index:

for(int partIndex=0; partIndex<mimeMultipart.getCount(); partIndex++) {
 BodyPart bodyPart=mimeMultipart.getBodyPart(partIndex); // javax.mail.BodyPart instance
 out.println("content type: "+bodyPart.getContentType());
 out.println("<br>");
 out.println("disposition: "+bodyPart.getDisposition()); // returns form-data
 out.println("<br>");
 out.println("file name: "+bodyPart.getFileName());
 out.println("<br>");
 out.println("part size: "+String.valueOf(bodyPart.getSize()));
 out.println("<br>");
 . . .
}

To obtain metainformation contained in header fields of each part, the getAllHeaders() method is employed: it returns an enumeration of javax.mail.Header objects.

Enumeration<?> allHeaders=bodyPart.getAllHeaders();
while (allHeaders.hasMoreElements()) {
 Header header = (Header) allHeaders.nextElement();
 out.println(header.getName()+"="+header.getValue());
 . . .
}

The content of a part is returned as a Java object. If the content type is text/plain, the data can be obtained immediately via conversion to string. For other MIME types, the part content is represented as java.io.ByteArrayInputStream:

Object content=bodyPart.getContent();
out.println(content.toString()); // returns 'Frenchman's Creek' for title text control
. . .
if(bodyPart.getFileName()!=null) {
 String filePath=targetPath+File.separator+bodyPart.getFileName();
 BufferedInputStream partInputStream=new BufferedInputStream(bodyPart.getInputStream());
 BufferedOutputStream stream=new BufferedOutputStream(new FileOutputStream(filePath));
 while((i=partInputStream.read(buffer))!=-1) {
  stream.write(buffer, 0, i);
 }
 . . .
}