Rethinking an API for Cacheability

At the beginning of anything new, there is just so much uncertainty. The first versions of an API built to support a growing, yet-to-be-defined product, can easily be built on top of quicksand. You’re doomed to make mistakes, but if you overcome those errors and get something tangible you may get the second chance: redoing, but better.

Redoing gives you the possibility of starting with the favorable, unfair, advantage of knowing (a little) about what doesn’t work. You may not know precisely what works, but you know a few things that doesn’t.

And this is amazing for your API.

I recently wrote about our API migration from REST/JSON to GraphQl at A migration between different API query protocols is an opportunity for redesigning your API as a whole. If you limit it to a protocol translation you’re not using all the additional knowledge acquired the first time and wasting resources. By the way, if you decide to redo something, make sure you do it clearly better to compensate at least for the loss in market time.

Our original API grew like a little cute monster — as most software tend to do. As new features needed to be implemented, stuff was added to the API, on top of what was already there. Fields. Endpoints. Relationships. More fields.

As the figures in a geometric pattern from a coloring book, there was no explicit starting point, no well defined way things were supposed to communicate, although they were somewhat correlated.

Our REST API as a geometric pattern from a coloring book
It is quite hard explore an API like this for the first time without reading detailed, boring documentation.

Fresh start: what to return?

In the API world we looked at that question in terms of data vs. policies. Policies are a subset of the whole business logic, which somehow includes also your data, or, at least, the way you model and validate it. Our APIs were full of policies and by rethinking our design we realized these were actual prejudicial.

Our business domain talks about Lists of people who attend to Events. On every List, we have many Invitations, which roughly represents the attendance of a person to an Event through a List. We have a logged in user (Viewer) that may be allowed or not to delete any one of the invitations in each list of an event. This question can be answered with an authorization policy. This would be calculated, server-side, for each invitation in an event (which may have thousands of them).

For simplicity, let’s say we could calculate this policy considering data about the Event (it may be owned by the viewer), the List (it may have been created by the viewer), or the Invitation (he may be either the invitee, inviter or both). Since we had thousands of invitations, performance could become an issue, but we could cache these calculations to prevent repeating them every time, right?

Kind of.

Policies usually depend on who is asking. Caching would need to happen on a per-viewer basis, and given our permission model includes multiple staff members for the event, caching could help but wouldn’t be really efficient, performance and memory wise.

But if our API would only provide the final products (data) we have built, instead of policies, caching would become so much easier. Caching Invitation responses could be as trivial as handling a few timestamps, as it doesn’t depend on the Viewer anymore. It’s up to our clients to calculate the policy themselves from the raw data returned by the API, but this is hardly a problem as most policies can be expressed as simple boolean equations.

Having the data may be another issue, specially if you have an api that raises rigid walls within your data. But once you have the required data, you’re good to go.

Policies are just one example of information an API may return that depends on multiple contexts. Whenever one such calculation can also (safely) be done client side, you should consider it. Obviously, delegating the calculation of policies to the client side doesn’t mean your server doesn’t need to enforce them, whenever a client request an operation on your data.

Originally posted on Medium

Endpoints raise rigid walls within your data

Relationships allow you to build amazing products and services. Don’t let your API get in the way.

We were quite happy with our REST api at, until we needed to render a lot of data on a single request, which was quite painful performance-wise. We were in our way to break down the data requirements by analyzing scenarios and tailoring requests to the essential information. From the webapp perspective, we would define what data was needed for each use-case, so that we could gradually request it in tiny little pieces, on demand.

… but …

Endpoints raise rigid walls within your data.

And we used to need those walls. They helped us reason about the server-client data flow, limit the context of authorization policies, name things. They helped us limit the exact quality that makes our data special: the relationships within.

With a resource-centric api, achieving a granular data retrieval logic often requires multiple requests. You would be happy if you could get away with requests you can paralelize. But the truth is harsh and on most cases they’re quite interdependent. You know, since your data is full of relationships.
So we acknowledged that going granular increased our client complexity, and perceived performance would also suffer.

We realized we had to change focus: to write an API that expands possibilities, allowing clients to name new relationships, extracting information we did not knew we had. The cool thing is, facebook had already thought about this and developed Graphql.

In less then a week we’ve made a proof of concept implementation of a GraphQL api using the amazing ruby gem, and had most of our REST api translated (oops!). We’re talking about implementing roughly the requirements that took our engineering team months do develop.

Adapting our Backbone.js based client to use the new api was easy and straightforward: we just needed to redefine the Backbone.sync behavior and tweak some variables. Nothing major.

But translating is not enough. We’re now rethinking the ways clients interact with our data given the new capabilities brought by GraphQL. In this process we’ve learned a few more things, like how nice to have a self-descriptive api, with an amazing interactive console, and not having to write or maintain so much documentation.
I’ll let you know how that goes any time soon :)

Originally posted on Medium

Bringing your workflow breakers to your command line

Do you know that ruby method you’d use to parse an iso datetime string? You know, that one?

Forgetting the precise name of a method or the most idiomatic way to solve some simple task can happen to you quite often, specially when you keep changing programming languages. When this happens, my first step is usually to open an interactive session of whatever language I’m using, like ruby’s pry, and try to remember the method or strategy needed while frenetically pressing tab and praying to the autocomplete gods. The reason I do this is because 90% of the time I find out what I was looking for quite quickly, without having to context switch out that much of my workflow (poor me if I was doing (ouch) PHP and have no decent interactive REPL to play with).

But it *seems* my problems are over now. I just found out about this amazing tool called how2. It basically puts StackOverflow in your bash. I can now find much quicker that the method used to parse ISO8601 datetime strings is, well, DateTime::iso8601. ;D

Pro tip: create a super alias for each programming language you use, like I did for ruby alias how2r='how2 -l ruby' and save yourself some precious keystrokes!

Writing a console game by making mistakes

I’ve decided to write a terminal, character based game. Main goal was to exercise a few software design concepts / patterns. So I started planning the game and afterwards figuring out the classes and entities needed for the overall project and design. Then I started TDDing on top of the initial design, and when I realized I suddenly had 8 classes. I had a Game class, a ConsoleViewport class, a Player, a GameInitializer class, a SpaceShuttle, a SpaceShuttleDisplayer… uff!

The initial goal was to do something extremely simple, but wanting to make sure everything had a single responsibility before actually getting some perceivable behavior led me to create new abstractions without any noticeable gain. There is no concrete issue on having all the above classes. But I think we have an issue considering that at this point in time the code couldn’t really do anything (not even display the game scenario). Abstrations must be justifiable, or they add only complexity.

So, next step is to throw it all away, and try BDD to lead the design, will let you know the results.

Rails / Unit testing ActiveRecord callback chain

I’m back programming on Rails after a few months away. Apart having to get reacquainted with most of the tools, it took me a little while to get back on my feet regarding my workflow.
One difficulty was getting to do unit tests the right way. After struggling a bit with rspec matchers, I faced two different issues: how to test a significantly procedural algorithm (which, believe me, was already as much OO as possible), and the second one, which I’ll talk about in this post, the testing of ActiveRecord callbacks.

Regarding a specific model, I needed to test a whole lot of business logic implemented in its before_save callback chain (let’s not dive into that, please). This callback chain consisted of 5 different methods, each with one specific responsibility. The correct implementation of the business logic depended on the callback order and, of course, on the correctness of each one of them.

I started by testing each single callback, by performing always the same strategy: setting up initial model/fixtures to ensure a specific test case was being stimulated, running! and either checking state or message expectations afterwards. This worked pretty well, except for one thing: when calling save we’re actually involving a whole lot of logic, not only of the callback we’re currently interested in, but also additional callbacks and other ActiveRecord interactions.

The result was that to test a single aspect involved in the whole callback chain, my tests had to setup fixtures or mock other unrelated aspects. Additionally, calling save triggers database interaction which tends to lead to pretty slow test suites. I remember in the past having to deal with extremely long-running test suites just for the excessive use of persistence operations, which aren’t really necessary most of the time. It didn’t take much time for me to realize I was doing it all wrong again, like in the good old days. The final approach I took was the following:

A first few tests were added to, first, ensure the callback order was being respected, which would also serve as documentation to other developers (this is quite relevant, since relying on callback order for correctness tend to be quite risky – but, well, let’s not discuss that either now). Secondly, assert overall business logic / behavior after the whole save / callback chains would lead to expected results. These initial tests were the only ones touching the database. After these, dozens of other specific methods testing just the specific callbacks being executed at each point.

I like to think of this approach as a divide and conquer strategy. First we ensure the correct messages are being sent (which alerts us in case someone accidentally remove a before_save callback from the chain), and that they go in the correct order. Secondly, we ensure each small message do their own job correctly. I’d say this is all unit testing, but this approach creates a relation between those two types of tests similar to the one between integration and unit tests.

The resulting test suite is pretty comprehensive (assuming one does a good job when testing the isolated callbacks), and at the same time they’re blazing fast.

Splat-splatting Function Calls in Python 3 (or formally argument unpacking)

Python 3 supplies many different argument passing strategies for function calls. The basic ones are positional or keyword parameters and the ability to supply default values. We can also define functions with a variable number of both positional and/or keyword arguments.

This allows for great flexibility and for the implementation of methods/functions which can respond to a vast spectrum of cases.

Using varargs in positional fashion is quite straight-forward, and most people with some experience in other languages have already used this technique before. Combining methods accepting varargs and argument unpacking (also available in most languages) makes for a very fluid programming experience. What I did not know until recently is that we can actually apply the argument unpacking to our own custom objects (like special list objects, for instance).

Here’s a very simple example of a varargs function, which simply prints arguments in a single line (much like the built-in print :D):

>>> def print_args(*args):
...    print(" ".join(args))

>>> print_args("a", "pretty", "cat")
a pretty cat

Great, what about argument unpacking?
Argument unpacking is the ability to expand the contents of an ordered (or not) collection as arguments of a function call.
As an example, let’s take a look at two ways to transpose a matrix, one using nested list comprehensions and the other one with a simple zip and argument unpacking (example from python 3 tutorial):

>>> matrix = [
...     [1, 2, 3, 4],
...     [5, 6, 7, 8],
...     [9, 10, 11, 12],
... ]

>>> [ [row[i] for row in matrix] for i in range(len(matrix[0])) ]
[[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]]

>>> list(zip(*matrix))
[(1, 5, 9), (2, 6, 10), (3, 7, 11), (4, 8, 12)]

In other languages the argument unpacking operator (an asterisk in case of Python and Ruby) is also referred to as “splat”. So, for the sake of brevity, I shall too use this naming convention, although it’s not really used by the python community. So, let’s splat:

>>> a_list = ["steals", "your", "food"]
>>> print_args(*a_list)
steals your food

Just like you can splat list elements into a varargs function, we can splat dictionaries into a var keyword arguments function:

>>> def print_with_labels(**kargs):
...    print(" ".join( v[0] + ":" + v[1] for v in kargs.items() ))

>>> a_dict = { "while": "you", "write": "code" }
>>> print_with_labels(**a_dict)
while:you write:code

Note the double asterisks, which stands for keyword args unpacking. Again, for brevity and clarity (and a bit of humour), I’ll call the double asterisk as splatsplat, as suggested by Josh Lee in on stackoverflow.

Note about dictionary keys ordering: In the above example I was lucky, and keys were passed to the method in the order I wanted generating a little phrase (or I may have tricked the example xD). However, since the standard dictionaries in Python 3 by default does not guarantee ordering, it could have been otherwise.

So, what’s the relation between a single asterisk operator and a double asterisk operator, if any? Is there any kind of recursion relation between splat and splatsplat? Can I play with associativity of splatsplat ? Well, except for the last question (which I have to assume it was actually too much), the rest of inquiries revealed some tricky stuff.

So I started by playing a little bit with the REPL, and after a few tries I got this:

>>> print_with_labels(*a_dict)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: print_with_labels() takes 0 positional arguments but 2 were given

Calling the method using only a splat on a dictionary did not raise some kind of strange, runtime or syntax error, but, instead, a TypeError with an error message that tells me that my single-splatted dict generated 2 positional arguments in the actual function call. So, why not try the same with print_args?

>>> print_args(*a_dict)
while write

That’s interesting! By applying the single asterisk operator I actually passed the dictionary keys as positional arguments!

I wasn’t satisfied, so I decided to splat some animals:

>>> class FluffyCat:
...     pass
>>> b = FluffyCat()
>>> print_args(*b)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: print_args() argument after * must be a sequence, not FluffyCat

So our interpreter is complaining that FluffyCat is *not* a *sequence*. I managed to solve this by extending, but that sounded strange since Python should actually require behavior implementation (Duck typing) and not specific class/inheritance.

Eager for answers, I decided I needed a sequence of cats that I could use as a splatted argument into print_args! And so I did:

>>> class CatsList():
...    def __init__(self, *initial_cats):
...       self._cats = initial_cats
...       self._index = len(initial_cats)
...    def __iter__(self):
...       self._index = len(self._cats)
...       return self
...    def __next__(self):
...       if self._index == 0:
...          raise StopIteration
...       self._index = self._index - 1
...       return self._cats[self._index]

>>> # And here is my cats list:
>>> cats = CatsList("Felix", "Ofelia", "Lidia", "Tigre")
>>> cats._cats
('Felix', 'Ofelia', 'Lidia', 'Tigre')
>>> # and now we can print it:
>>> print_args(*cats)
Tigre Lidia Ofelia Felix

My suspicion was correct: although the previous error message stated the object actually had to *be* a *Sequence*, the only thing that matters is whether or not the object quacks (behaves) like (ish) a Sequence.

What is the difference between FluffyCat and CatsList? CatsList implements the iterator protocol, and hence, the interpreter is able to resolve argument unpacking on it by calling iter() and next(). As you can see, we do not need to implement the full Sequence behaviour, with methods like __len__, __getitem__, etcettera.

This is the reason why applying a single argument unpacking operator to a dictionary actually yields it’s keys as arguments: dictionaries implement the iterator protocol over it’s keys (same reason for which you can write for key in my_dict: (…) ).

That’s it for splats! I hope this article was useful. Please leave your comments below :)

Disclaimer: no animals were harmed in the making of this post.