In the last chapter, we created a form parser. But it’s still incomplete as being
able to upload files is one of the advantages of using forms. So in this chapter
we will break down how a file is different from normal form data, and then devise
an approach to handle that information.
Using the raw form text program we edit our chapter 8 example to include a file
type and upload a very small image to see how information is encoded for a normal
form field when compared to a file.
The output should look like the above image. A few differences are easily
observable. For a normal field type we have “Content-Disposition: form-data;”
prepended onto the front, followed by the name of the field, and then two line
breaks before the value of the given field.
In the case of a file, the prefix is the same with Content-Disposition being
declared followed by the name of the field. From there we see something quite
different. The name is followed by a filename that wasn’t present before, and
we have MIME type “Content-Type: image/png” sent from the client.
Following that we have two line break before some unusual characters. Those
characters are the utf8 string representation of the binary image we sent
to the server. In this case the information of a single pixel image.
Given the structure we have an usual situation where we have to work with the
same information as both text and binary data. We have to determine the form data
boundary to split the form data into different parts. From there we have to parse
the text from the header to retrieve information for the given fields. And the
finally following two line breaks, we have to get the binary data from that offset
until the end of the current boundary.
We will simply edit our “api_form” function from the previous chapter to create
a more generalized form which is able to handle both standard form fields and
form files. The source code is as given below.
function api_form(req, res) { var raw_data = []; var raw_length = 0; req.on("data", function(data) { raw_data.push(data); raw_length += data.length; }); req.on("end", function() { var boundary, i, buf, file, line; var buffer = raw_data.concat(raw_data, raw_length); var ofs = []; let form = {}; let files = []; for (i = 0; i < buffer.length; i++) { if (buffer[i] !== 10) { continue; } boundary = buffer.toString("ascii", 0, i - 1); console.log(boundary); break; } i = 0; while (i !== -1) { ofs.push(i); i = buffer.indexOf(boundary, i + 1, "ascii"); } for (i = 0; i < ofs.length - 1; i++) { buf = buffer.slice(ofs[i], ofs[i + 1]); boundary = buf.indexOf("\r\n\r\n", 0, "ascii"); let header = buf.slice(0, boundary).toString("ascii"); let key = header.match(/name="(.*?)"/)[1]; let filename = header.match(/filename="(.*?)"/); file = buf.slice(boundary + 4); if (!filename) { form[key] = file.toString(utf8); } else { files.push({ "name": filename[1], "key": key, "data": file }); } } var response = []; async.eachSeries(files, function(file, nextFile) { response.push("/img/" + file.name); fs.writeFile("public/img/" + file.name, file.data, function(err) { if (err) { throw err; } nextFile(); }); }, function() { res.writeHead(200, { "Content-Type": "text/plain" }); res.end(JSON.stringify(response)); }); }); }
The code is not the most elegant approach, but it works, so let’s go over it.
function api_form(req, res) { var raw_data = []; var raw_length = 0; req.on("data", function(data) { raw_data.push(data); raw_length += data.length; });
As data comes in bursts of buffers, we create an array where we store each buffer
as it comes, and store the length of the total raw buffer.
req.on("end", function() { var boundary, i, buf, file, line; var buffer = raw_data.concat(raw_data, raw_length); var ofs = []; let form = {}; let files = []; for (i = 0; i < buffer.length; i++) { if (buffer[i] !== 10) { continue; } boundary = buffer.toString("ascii", 0, i - 1); console.log(boundary); break; }
Once the client has completed sending the information to the server, we concatenate
all of our partial buffers into one complete buffer. The next step is to file
the first line break, as the first character up to that point will be the
form boundary string. We store this string into a variable so we can separate out
the boundaries.
i = 0; while (i !== -1) { ofs.push(i); i = buffer.indexOf(boundary, i + 1, "ascii"); }
From there we iterate over the entire file. We start from 0, and search for the
boundary string. This will give us a separation where the boundary will be at
the top of the given section with the data ending at the next offset.
for (i = 0; i < ofs.length - 1; i++) { buf = buffer.slice(ofs[i], ofs[i + 1]); boundary = buf.indexOf("\r\n\r\n", 0, "ascii"); let header = buf.slice(0, boundary).toString("ascii"); let key = header.match(/name="(.*?)"/)[1]; let filename = header.match(/filename="(.*?)"/); file = buf.slice(boundary + 4); if (!filename) { form[key] = file.toString(utf8); } else { files.push({ "name": filename[1], "key": key, "data": file }); } }
It’s from there where we have to sort out server conditions. First slice the
total raw buffer to focus on a single field segment. From there we can assume
that two line breaks defines the break between the field header and the field
data. We locate the end of the header index and convert it to a string to extract
that information.
From there we use a regular expression to extract the field name and attempt to
extract a filename, if it exists. The rest of the data we can assume is the data
associated with the given field from the client. If no filename exists, then we
can assume it’s a standard form input and we store the name string a key to a form
object. Otherwise if there is a filename, then we store the filename, the original
name associated with the form and the binary data.
var response = []; async.eachSeries(files, function(file, nextFile) { response.push("/img/" + file.name); fs.writeFile("public/img/" + file.name, file.data, function(err) { if (err) { throw err; } nextFile(); }); }, function() { res.writeHead(200, { "Content-Type": "text/plain" }); res.end(JSON.stringify(response)); });
From there we iterate over the files and write them to the “`img“` directory
in our public folder for debugging and add each of the file url’s to a list which
we return to the client.
The html file for this chapter is given below.
File: form.html
<!DOCTYPE HTML> <html> <head> <meta charset="utf-8"/> <title>Ajax Request</title> </head> <body> <form id="exampleForm"> <table> <tr> <td>Profile Image:</td> <td><input type="file" name="profile_img" multiple/></td> </tr> <tr> <td>Submit:</td> <td><input type="submit" value="Submit"/></td> </tr> </table> </form> <br> <pre id="responseText"></pre> <script type="text/javascript" src="js/form.js"></script> </body> </html>
File: js/form.js
"use strict"; var exampleForm = document.getElementById("exampleForm"); var responseText = document.getElementById("responseText"); exampleForm.addEventListener("submit", function (event) { event.preventDefault(); var formData = new FormData(exampleForm); var xml = new XMLHttpRequest(); xml.open("POST", "/api/form", true); xml.send(formData); xml.onload = function() { responseText.textContent = xml.responseText; } });