As the engineers among you will know very well, what is presented to the user is very different to what's behind the scenes -- this applies in buildings with plant, cabling and pipework hidden behind facades, suspended ceilings, wall panels and floor tiles; and it applies to software equally. In this post, I want to just give a quick run down of what technical tasks are behind keeping our platform operational, and a peek behind the curtain to what's coming and why it will be worth the wait.
Given the scale of our enterprise, collecting and processing over 1.5m data points every hour, maintaining over 50 code repositories containing millions of lines of code, we are an incredibly small development team. Any of our long term customers will know that our platform has progressed enormously over the last 10 years but sometimes it appears that nothing much is changing. Some of this cautious release of new features is through careful planning of those features; and a lot of it is currently through maintaining the smooth running of features currently deployed.
Just like the buildings whose health we check (and maintenance we promote!) our own systems require this TLC to stay optimal, secure, and performant. This quarter we have been very busy working to update our entire code base to the latest and most secure version of the programming language that we use, Python. With this move to Python 3, we reassure our ongoing obligation to keep your data safe - as we say goodbye to the language that has launched us up and running over the last decade, this language update places us on solid ground for exciting new feature delivery and essential security updates as we move into the 2020s.
The history of monitoring in digital, web based systems perhaps has a better track record than that of the digitally-driven electromechanical realm (our raison d'etre!) however, it is always possible to add more monitoring and improve how we use it. In our software development cycle we have, over the last 5 years, moved from quite manually intensive and cognitively demanding monitoring and 'unpickling' (as one of our Technical services team might call it) to far more passive and easier to read statistics and logs.
Part of this monitoring progress has been through robotising our development and deployment pipeline, using the latest and greatest tools such as Concourse; shifting error messages from logs on servers into full-context events via Sentry; moving from "snowflake" servers to disposable, repeatable Docker images and making use of Rancher to manage the resulting army of serverlets; and our embedding of passive service statistics via StatsD and Grafana. All of these technologies put us in greater and greater stead to keep our systems running more smoothly as we grow and evolve, however they often take some effort to put in place. The big benefit is also that we are able to scale up more effectively.
Another approach that we as software engineers and systems administrators have to take is - rather than maintain what you have - rebuild it anew, with more up-to-date tools or methodologies. This too can be very time consuming, though often quicker than the cumulative time spent on maintaining an older system. It is an essential part of the development process, allowing us to slowly retire older services while we rebuild new ones in parallel, and switch over to these newer services piece by piece.
In the last year we have not only added a whole load of tooling that allows us to monitor and scale up our systems, but also have and are actively rebuilding parts of our software offering from scratch. Features that were long-ago created as prototypes and quickly became very popular and essential parts of our service, such as the Scorecard (née "League Table") have had their underlying architecture completely changed.
In a previous post I talked about our progress of our world-model - i.e. the way we represent your properties in software. As part of this remodelling, we opened up an almost endless number of possibilities of connecting data streams from the BMS to our digital asset counterparts, and the attributes of those assets. For example, if we don't have a direct BMS data stream for a given attribute of the asset (e.g. its output power) then we can use an approximation of this using knowledge of the asset's other attributes (like the input power and efficiency, or the temperature difference across parts of the asset).
This new way of combining attributes of assets, and related assets has vastly increased the complexity, and hence the intelligence and accuracy of our platform. It also means that some of our calculations have become more complex, and in the digital world this comes with some cost. A large part of our restructuring in 2019 so far has been to build and improve systems that make intelligent guesses about some of these asset attributes, and when they are complete enough to also process and aggregate them offline so that they are ready to drive the summary views that help you to make decisions about what to fix, and how your properties are performing.
Given these new super powers, you can look forward to exciting new views that surface summary information about floor- or zone-level KPIs, as well as cutting out the amount of time you have to spend looking at the underlying data in order to get insight of "what to fix next"!
In the meantime, we're continuing to improve our service's reliability, internal simplicity, intelligence and scalability. Thanks for your patience while we carry out these essential bits of work!