I love YAML. I use it in almost all my applications to manage configuration. It is easy to read and write and allows for complex objects as well as arrays/lists of data. In fact, YAML 1.2 is technically a complete superset of JSON; so anything you can do with JSON you can do with YAML. YAML also comes with broad cross-language support. For example, there are some powerful libraries out there for working with YAML in Java, such as SnakeYAML and YamlBeans.
The Old
Several years back, my colleague Charles Draper wrote a library to simplify the process of parsing and working with YAML configurations. Initially this library was part of an internal utility package, but about a year ago we went through and split it out into its own repository. During that process, we debated and nailed down some of the expected behavior of this config library and established a few informal guidelines that governed development:
- It had to be stupidly convenient to pull discrete pieces of data out of the Java representation of the data.
- Since we have a number of student developers who come and go every few semesters, the learning curve needed to be minimal if we wanted to have any hope of widespread usage.
- It had to be able to read from multiple files and merge them in a fashion similar to how CSS works.
I want to talk briefly about that 3rd requirement, since that really stands apart from the general case of “I just want to parse a YAML file.” In several situations, we have found it beneficial to have a cascading configuration loader that is able to parse up a baseline configuration, then read subsequent files that tweak the behavior to work for a specific server or environment.
For example, imagine that an application relies on an external database. In production, we of course want to read/write to the production database, put perhaps on our staging and development servers, we want to work with a test database instead. Nothing too crazy.
So we might have a baseline application.yml file that contains the bulk of our configuration along with some placeholders for database connection values. And on our production and staging servers respectively we create a production.yml and stage.yml file containing the specific connection values for that environment, as follows:
| --- | |
| database : | |
| host : localhost | |
| username : myuser | |
| password : mypassword | |
| # other configuration as appropriate... |
| --- | |
| database : | |
| host : proddb.example.com | |
| password : prodpassword |
| --- | |
| database : | |
| host : stgdb.example.com | |
| username : stguser | |
| port : 3306 |
At application startup, then, what we really want to do is to load the baseline application.yml and then substitute the appropriate fields from the system-specific YAML files. The way we have been handling this is by having a master Config object which we loaded one or more YAML files into as data. The Config object provided a long list of access methods to allow data to be pulled out using XPath-type references.
It worked, and we were quite proud of what we built. But we have found something even better!
The New
I recently discovered that the venerable Jackson library has a YAML plugin. This plugin uses SnakeYAML under the hood to parse the YAML data using an ObjectMapper, which means that the output of the parsing operation can be a standard JsonNode or even a targeted POJO bean.
Armed with this plugin, we gutted our Config library this week and rebuilt it to use Jackson as the data parser. We are thrilled with how easy it was to set up and how simple it is to use. Our master Config object is gone altogether; applications now interact directly with the loaded data using the JsonNode or POJO outputs mentioned above. Check out how this easy this makes loading and interacting with configuration data:
| --- | |
| a : alpha | |
| b : bravo | |
| c : | |
| d : delta |
| YamlLoader loader = new YamlLoader(); | |
| Path sourcePath = Paths.get("example.yml"); | |
| //The config library can load data into a generic JsonNode | |
| JsonNode node = loader.load(sourcePath); | |
| System.out.println(node.path("a").asText()); //Outputs "alpha" | |
| System.out.println(node.path("b").asText()); //Outputs "bravo" | |
| System.out.println(node.path("c").path("d").asText()); //Outputs "delta" |
| YamlLoader loader = new YamlLoader(); | |
| Path sourcePath = Paths.get("example.yml"); | |
| //The config library can also load data into a Jackson-annotated POJO | |
| ExamplePOJO pojo = loader.load(ExamplePOJO.class, sourcePath); | |
| System.out.println(pojo.a); //Outputs "alpha" | |
| System.out.println(pojo.b); //Outputs "bravo" | |
| System.out.println(pojo.c.d); //Outputs "delta" |
| public static class ExamplePOJO { | |
| @JsonProperty String a; | |
| @JsonProperty String b; | |
| @JsonProperty Charlie c; | |
| public static class Charlie { | |
| @JsonProperty String d; | |
| } | |
| } |
Stupidly convenient? Check.
Minimal learning curve? Check.
But what about cascaded loading?
Since at the end of the day, we’re dealing with native Jackson objects, we did a little hunting to see if Jackson supported deep merge operations out of the box. As far as we can tell, it does not at the time of this writing. However, there is an open feature request for exactly that, and a number of people have rolled their own merge methods.
We took one of those methods and broke it down to understand exactly how it was working. We then added our own version of this merge logic into the YamlLoader class. In a nutshell, it merges two JsonNode objects A and B using the following logic:
- If A is a missing node, then simply add B.
- If either A or B is a simple field, then replace A with B
- If either A or B is an array, then replace A with B
- If both A and B are complex objects, then recursively call merge on each child element.
We have now theoretically satisfied our 3 basic requirements, so let’s see how this would work in our original example. We will instruct the YamlLoader class to load our baseline application.yml and then overwrite some data from the production.yml and stage.yml files:
| YamlLoader loader = new YamlLoader(); | |
| // Cascade load application.yml and production.yml. Values in production.yml should | |
| // trump values in application.yml. | |
| ExamplePOJO pojo = loader.load(ExamplePOJO.class, | |
| Paths.get("application.yml"), | |
| Paths.get("production.yml")); | |
| System.out.println(pojo.database.host); //Outputs "proddb.example.com" (production.yml) | |
| System.out.println(pojo.database.username); //Outputs "myuser" (application.yml) | |
| System.out.println(pojo.database.password); //Outputs "prodpassword" (production.yml) | |
| System.out.println(pojo.database.port); //Outputs "null" (referenced by neither file) | |
| // This final example doesn't really jive with our hypothetical situation, but it's | |
| // still interesting from a demo perspective. In this case, stage.yml will trump | |
| // values from the original application.yml, but will itself be trumped by values | |
| // in production.yml. | |
| pojo = loader.load(ExamplePOJO.class, | |
| Paths.get("application.yml"), | |
| Paths.get("stage.yml") | |
| Paths.get("production.yml")); | |
| System.out.println(pojo.database.host); //Outputs "proddb.example.com" (production.yml) | |
| System.out.println(pojo.database.username); //Outputs "stguser" (stage.yml) | |
| System.out.println(pojo.database.password); //Outputs "prodpassword" (production.yml) | |
| System.out.println(pojo.database.port); //Outputs "3306" (stage.yml) |
| public static class ExamplePOJO { | |
| @JsonProperty DatabasePOJO database; | |
| //Other fields as defined in application.yml (outside the scope of this demo) | |
| public static class DatabasePOJO { | |
| @JsonProperty String host; | |
| @JsonProperty String username; | |
| @JsonProprety String password; | |
| @JsonProprety Integer port; | |
| } | |
| } |
Conclusion
We think this is a pretty slick way of managing configurations and will start rolling it out across our applications. We have open sourced it, so if you are interested in using it, please check out the repository and let us know what you think of it in the comments below!
Maven: Coming Soon
Author’s note: The statements and views expressed in this article are the author’s own and do not represent the view of Brigham Young University or its sponsors.