There’s something about YAML

August 16th, 2015 No comments

When you’re developing an app, you often run into the problem of needing to store structured data in some format. The crucial thing is that you absolutely cannot just make up some dumb format of your own, like:

# marty's homemade file format
# (now I get to write code to parse it from scratch; yay!)
server = "localhost"
email = ""
buddies = {

Many years ago I settled on XML as a nice standard file format. XML is basically HTML except that you can use any tags or attributes you want. It’s easy to understand if you know HTML; it’s nicely rigidly structured and can be validated if you create an XML schema specifying which tags/attributes are allowed where.

<?xml version="1.0" encoding="UTF-8"?>
    <!-- Marty's data in XML -->

XML has served me well for many years, but it’s a bit cumbersome. To pull the data out of the XML file, you need to use the XML document-object model (DOM), a set of classes and methods for talking to XML data. The good news is that the XML DOM is mostly available and standardized across lots of programming languages. The bad news is the the XML DOM is verbose and kind of clunky to use. To pull the buddies out of the above stuff, you’d have to write code like:

// this is mostly Java
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try {
    DocumentBuilder builder = factory.newDocumentBuilder();
    InputSource source = new InputSource(IOUtils.toInputStream(xmlFileText));
    Document doc = builder.parse(source);
    Node settingsNode = doc.getElementsByTagName("settings").getChildNodes().item(0);            // ...
    Node buddiesNode = settingsNode.getElementsByTagName("settings").getChildNodes().item(0);    // ...
    NodeList buddies = buddiesNode.getElementsByTagName("buddy");
    for (int i = 0; i < buddies.getLength(); i++) {
        Node buddyNode = buddies.item(i);                                                        // ...
        String arrrrrrrrrrgh = buddyNode.getTextContent();                                       // "Allison"  (finally!!)
} catch (Exception e) {

So, that all works, but it can be kind of a pain. A lot of folks using web apps have switched their data into JavaScript object notation (JSON) format. JSON is the cute idea that you should just declare your data as a JavaScript object literal, as a map from property/field names into values.

// Marty's data in JSON (a comment isn't legal here but oh well)
    "server": "localhost",
    "email": "",
    "buddies": {
        {"name": "Allison"},
        {"name": "Helene"},
        {"name": "Katlyn"},
        {"name": "Mariana"},
        {"name": "Zorah"}

JSON is a nice format because if your language knows how to parse JSON, you can usually just turn the data into a nice nested object and access each property directly, something like:

myJsonData["buddies"][3]["name"]   // "Mariana"

In some recent work on my Practice-It web app I was getting sick of a glut of XML data the app uses. One annoying thing about this particular data was that a lot of it was HTML snippets. The app uses XML files to describe programming exercises, each with an HTML description and a Java code solution. So I had things like:

<?xml version="1.0" encoding="UTF-8"?>
        &lt;p&gt;Write a method named "max" that accepts two integers and returns which one is greater.&lt;/p&gt;
        &lt;p&gt;For example, the call of &lt;code&gt;max(12, 47)&lt;/code&gt; returns &lt;code&gt;47&lt;/code&gt;.&lt;/p&gt;
        public int max(int a, int b) {
            if (a &gt; b) {
                return a;
            } else {
                return b;

Notice how I have to HTML-encode the description and solution because you can’t have the < > characters inside a tag in XML. It’s just a pain to have to HTML-encode everything and carefully nest all of it inside the XML.

JSON wouldn’t be much better; all the property names/values have to be quoted and JavaScript-string-encoded, and multi-line values are crappy, so if I were to convert the above to JSON, I’d get muck like:


Write a method named \"max\" that accepts two ints and returns which one is greater.


For example, the call of max(12, 47) returns 47.

" "solution":"public int max(int a, int b) {\n if (a > b) {\n return a;\n } else {\n return b;\n }\n }" }

Doesn’t seem much better to me.

ENTER YAML. YAML is a data markup language commonly used with Ruby on Rails apps for config files and things of that nature. I’ve seen it called “the Python of markup languages”, which is a great description. Its syntax is brief and bare, using name: value pairs along with spaced indentation to indicate nesting. You can indicate a multi-line value with a | or > character. Look at this loveliness:

# This is YAML, MFers, awwwwww yeah
    description: |
        <p>Write a method named max that accepts
        two ints and returns which one is greater.</p>
        <p>For example, the call of <code>max(12, 47)</code>
        returns <code>47</code>.</p>
    solution: |
        public int max(int a, int b) {
            if (a > b) {
                return a;
            } else {
                return b;

Not all languages include a YAML parser, but I found a small Java library that turns the YAML data into a nested Map<String, String>, which is fine for my purposes:

Map<String, String> yamlData = FileReader("mydata.yaml"));
String solutionCode = yamlData.get("problem").get("solution");

It’s just so clean and simple and nice. I’m in love. I’m planning to create most of my new problem data files in YAML rather than XML from now on; it’s just so much faster and cleaner for me to author the files.

YAML isn’t perfect. One issue is that it can be tougher to verify that a given YAML file matches some exact schema. So I think YAML is currently best suited to data that comes from a trusted source; in this case, the data is authored by me, and I trust me.

Categories: Uncategorized Tags: , , , , ,

A few code style conventions I’ve adopted

April 16th, 2015 No comments

I write a lot of code, and here are a few little stylistic things I try to always do now.

1) Mandatory comment when passing boolean flags or weird random args: I learned this one when interning at Facebook. What’s the third arg here??

processDatabase(server, "Users", true);

No more mystery flags! You must comment it so I know what the heck I’m passing and what it means.

processDatabase(server, "Users", /*overwrite*/ true);

2) Mandatory comment on an empty block: Sometimes I have an empty block or empty loop or whatever. But here’s one in my code; did I mean to do that? Is it a bug? Just comment it and be sure. Got this one from Eric Roberts.

// should this ctor be empty?? (bad)
public FluxCapacitor() {

// this one is better
public FluxCapacitor() {
    // empty

3) Author/version comments on files: It seems like a simple/obvious thing, but just tagging my files with my name and when I last modified them has helped me a lot with “is this the newest copy of this file?” issues. I also put in what I changed between each version.

 * ........ (description)
 * @author Marty Stepp
 * @version 2015/04/15
 *  - fixed bug with null pointer in query function
 *  - alphabetized function names
 * 2015/01/29
 *  - added getRandomBlerg() method
 * ...

4) Two-letter for-loop counter variable names: It’s a nightmare to search for “int i” in your code. How about two letters?

for (int ii = 0; ii < 10; ii++) {
    for (int jj = 0; jj < ii; jj++) {

What are some little style rules that you love to follow in your own code?

Categories: Uncategorized Tags:

Movie Review: Hobbit Battle of Five Armies

December 27th, 2014 No comments

Saw Hobbit: Battle of Five Armies last night. One word review: Meh. Longer review: Unnecessary. (SPOILERS BELOW)

It’s a nicely made movie, with good visuals, good f/x, good acting, good music, good action, etc. It’s not sloppily done. But I just didn’t connect with it. It felt so unnecessary. I didn’t like the other two, either; it all just felt bloated and padded out to ridiculous levels. This movie and the last one, Desolation of Smaug, really felt like one movie to me. This third installment, almost the entire movie is just one big battle sequence, and after a while I just stop caring about the various characters and armies that have been poorly introduced and haven’t done anything to make me feel invested in them.

That’s the biggest sin of these movies: the weak character development. In the LotR trilogy, almost every single character was interesting and made me want to see more of their story. Think of Gimli and Legolas, think of Boromir and Faramir, think of Arwen and Eowyn, Galadriel and the elf lord Elrond. Think of Gandalf. Saruman. Every character is fucking badass. In this one, they fail to make me care about these dwarves traveling with Bilbo, to the point that I can honestly not tell you who more than 3-4 of them are, despite having their names memorized from reading the book.

I like Radagast the wizard, but why does he have bird poop all over his head and act like a dolt?? Why does the awesome bear-man Beorn barely get any screen time? Why does the awesomer Smaug barely get a line in this movie? And/or, why didn’t they just resolve the Smaug story in the last film rather than ending that one in the middle of the action? I actually didn’t mind that they added Azog the pale white orc, a made-up character for this movie so it would have a main bad guy … but then a lot of the other orcs in this one have the same pale white skin, so it’s hard to tell which one is Azog! They give a lot of screen time to this weaselly guy Alfrid and give him a unibrow and bad British teeth and have him act like a conniving jerk and give him lots of goofy comedy lines. Is he the Jar-Jar of the Hobbit?

They add Kate from Lost as a she-elf, but then her entire purpose is to be a prize for … Kili? One of the dwarves. Who just dies anyway, so nothing comes of it. And hey, why is Kate-from-Lost-elf allowed to love a dwarf like it’s no big deal, when Arwen in LotR needed to give up her elf immortality to love Aragorn? Why is Legolas in these movies, and why can he jump on/off of falling rocks as if it were a video game?

The primary villain, Smaug, gets dispatched in the first 5-10 minutes of the movie by “Bard”, some random character who just showed up near the end of the second movie and to whom the audience has almost no connection. He comes from some town full of uninteresting people, with whom the audience also has almost no connection. Yes, I know that’s how it was in The Hobbit book, but if they’re going to change things, why not strengthen this character and involve him more in the overall story?

There are also things in here that make the LotR trilogy not make sense. Like, if Gandalf is going to see and face Sauron so much in this movie, including being held captive by him and everything, then why is he completely shocked and surprised when he discovers that Sauron and the ring have returned in LotR Fellowship of the Ring? Why do the Nazgul not look or act the way they do in LotR at all?

I really wanted to love these movies because the LotR trilogy is among the best film series of all time. This Hobbit trilogy just felt like going through the motions. All the action and look/feel of LotR but none of the heart or character. Hours of CGI armies fighting and dwarves floating in barrels and white orcs, full of sound and fury, signifying nothing.

This book should not have been made into three movies. I could actually picture them making two good ones out of it, or even one, though a single film might be overly cramped. It should have been two. There were two really neat movies here, or three mediocre ones. The studio chose the latter. The Hobbit: Battle of More Monies. I can’t wait until Peter Jackson releases the Special Editions where Gandalf doesn’t shoot first.

Categories: Uncategorized Tags: