Part 2: Further adventures in parsing, the REST sync interface

A few weeks ago I wrote about my annoyances with writing an updater for my configuration parser/configuration visualizer thingy.

I was at that relatively sure that I wasn't able to write something generic to cover the use case. Luckily I was wrong.

All it took was for me to write a couple of hundred lines of code to update various objects. The more I wrote, the more I started repeating earlier patterns I had discovered for solving the tasks.

One of the Python cornerstones is DRY - Don't Repeat Yourself, and I was doing exactly that.

Once I had spend the majority of a long train trip from Gothenburg to Copenhagen writing the import logic for a single object I was ready to refactor.

What was the problem again?

The problem is that I have a network parser that returns a dictionary of dictionaries of parsed data from the network. It's all strings, and it's all without relations between objects.

On the other hand I have a backend, updatable by REST. This is built with Django and has the full database functionality including relations between objects resolved using internal database keys (ID sequences).

In my first attempt, it took a lot of effort to make sure I compared the same things when I was working out whether the backend needed updating. For example a VLAN of "123" is not the same as 123 (one being a string, the other an integer).

This led me to write rather bespoke code for each object that I needed to process, and I previously couldn't find a smart way to go about it. I may have found that way now.

Here are some principles I've used to make it simple.

It's my code, I can make assumptions!

I could sense my mind drifting towards trying to recreate Django-like complexity in order to create as clean and reusable interfaces as entirely possible.

In the end, simplicity luckily won.

One of the concepts I needed to introduce was a schema component. Something that I could use to make sure that I was comparing with the right kind of types. The solution ended up being quite simple:

# Schema for a network port
schema = {
    'name': str,
    'description': str,
    'admin_down': bool,
    'node': self.get_node_id
}

def compare(self, litem, ritem):
    """ litem is local item, ritem is remote item"""
    needs_update = False
    for key, callable in self.get_schema().items():
        if callable(litem.get(key, None)) != callable(ritem.get(key, None)):
            needs_update = True
            break

    return needs_update

This makes sure that we compare all parameters using their correct datatypes, and we can pass functions that can fetch foreign keys, such as the get_node_id here.

I still like simple_rest_client

I make extensive use of a python package called Simple REST client as it allows me to define API endpoints and actions dynamically, while having a parent class that holds the root URI and authentication parameters.

By being a little smart about naming, I can name my classes the same as the API endpoint, and that saves me some coding:

class AlcoveGenericModel(object):
    """Generic Model to facilitate model syncing using the DRY principe"""

    def __init__(self, **kwargs):
        self.dry_run = kwargs.get('dry_run', False)
        self.actions = []
        self.api = kwargs.get('api')
        self.config = kwargs.get('config')
        if hasattr(self, 'model_actions'):
            class RestResource(Resource):
                actions = self.model_actions

            self.api.add_resource(
                resource_name=self._get_class_name(),
                resource_class=RestResource
            )
        else:
            # Try to add a generic one
            self.api.add_resource(
                resource_name=self._get_class_name()
            )

If I subclass the above as Nodes, I automatically get full CRUD actions for MY_BACKEND_URL/nodes/ or I can specify a self.model_actions to provide my own actions.

Empty filters, and object passing functions

My base class includes functions that do absolutely nothing or just return whatever is given to them. These are meant to handle special cases.

For example:

class AlcoveGenericModel(object):
    """Generic Model to facilitate model syncing using the DRY principe"""
#...
    def get_items(self):
        """ Returns self.items.
        Override to manipulate the items.
        """
        class_name = self._get_class_name()

        try:
            items = self.config.get(class_name, None)
        except AttributeError:
            raise AlcoveSyncModelNotImplementedError(
                'items not set in class'
            )

        if items is None:
            raise AlcoveSyncModelError(
                f'Nothing named {class_name} in config'
            )

        return self.filter_local_items(items)

    def filter_local_items(self, items):
        """ Override to filter the local items """

        return items

Here get_items() looks for a key in the self.config dict named like the class it's called from, and runs them through self.filter_local_items().

This function per default just passes along the object.

However, in the case of network Ports, the parser actually returns these as a dictionary, where the REST have them in a list, so in this particular case, we can use filter_local_items to convert the parsed structure.

class Ports(AlcoveGenericModel):
    def __init__(self, **kwargs):
        self.node_id = kwargs['node_id']
        self.model_actions = {
            'list': {
                'method': 'GET',
                'url': f'nodes/{self.node_id}/ports'
            },
            'create': {
                'method': 'POST',
                'url': 'ports'
            },
            'retrieve': {
                'method': 'GET',
                'url': 'ports/{}'
            },
            'update': {
                'method': 'PUT',
                'url': 'ports/{}'
            },
            'partial_update': {
                'method': 'PATCH',
                'url': 'ports/{}'
            },
            'destroy': {
                'method': 'DELETE',
                'url': 'ports/{}'
            },
        }
        self.schema = {
            'name': str,
            'description': str,
            'admin_down': bool,
            'node': self.get_node_id

        }

        super().__init__(**kwargs)

    def get_node_id(self, *args):
        return self.node_id

    def filter_local_items(self, items):
        return list(items.values())

This is actually the full Ports sync code, just to show how short it can be done. I have a REST end point that only gives me ports from a particular Node. If this was not the case, I have a similar function, filter_remote_items() I could call to make sure I only get the remote items I'm interested in.

The actual update logic

Here's one of the two update functions. The other one just updates a single object and returns its ID. This one expects a list.

    def update_list(self):
        items = self.get_items()
        remote_items = self.get_remote_items()

        if hasattr(self, 'bulk'):
            if self.bulk:
                bulk_create = []
                bulk_update = []
                bulk = True
            else:
                bulk = False
        else:
            bulk = False

        for item in items:
            # We need to schema validate the object
            validated_item = self.validate(item)

            remote_item = self.lookup_item(validated_item, remote_items)
            if remote_item is None:
                # Item needs to be created
                if bulk:
                    bulk_create.append(validated_item)
                else:
                    self.perform_create(validated_item)

            else:
                # Compare the object
                if self.compare(validated_item, remote_item):
                    if bulk:
                        bulk_update.append(validated_item)

                    else:
                        self.perform_update(validated_item)

        if bulk:
            if len(bulk_create) > 0:
                self.perform_bulk_create(bulk_create)
            if len(bulk_update) > 0:
                self.perform_bulk_update(bulk_update)

        delete_items = self.find_deleted_items(items, remote_items)
        for item in delete_items:
            self.perform_delete(item)

This can be a bit simplistic, because in the end, I'm reading from the source of truth, so local is always right.

Conclusion

It's super fun to solve these problems in concise reusable ways. I can't share the full code just yet, but if you're wondering about something, let me know, and I'll do my best to elaborate.

Obviously I am not done with this sync component, but I got a long way just now by making it easy to extend.

As always, I'm super interested in hearing from you, just leave a comment below.

comments powered by Disqus