How we updated our main product’s core without downtime (firs steps)

This is some text inside of a div block.

Índice

Necesidades de inversión y oportunidades en la economía circular.

Ricardo Vila, ingeniero de software en TEIMAS, cuenta de primera mano su trabajo desarrollando Teixo, un producto Saas que fue escrito originalmente en Rails 2.3.

‍

En 2021 comenzamos un viaje para actualizar Teixo a la última versión de Rails, y estos posts son mi diario:

Este artículo está únicamente disponible en inglés.

‍

1. Introduction

We are TEIMAS, a leading software company specializing in digitalizing the waste value chain, one of the few worldwide. We assist companies in transitioning towards a circular economy and decarbonization to safeguard resources, the environment, and public health. Our software solutions promote the reintroduction of waste into the production chain, reducing the need for new resource consumption.

One of our flagship solutions is Teixo, a SaaS software specifically designed for Waste Management Companies. Teixo offers a wide range of features, including:

Generation and processing of documentation for waste transfer, treatment, and identification
Process control to streamline waste operations and product offerings
Simplification of procedures to ensure compliance with regulatory requirements
Integration with all regional waste platforms and the state e-SIR platform (over 10 platforms)
Accessibility for both web users and remote systems through internal and external APIs for mobile apps and customer ERPs.

Teixo is the leading waste management tool in Spain, implemented in over 700 waste management facilities with more than 3000 active daily users.

This series of posts focuses on the project of updating our flagship product, Teixo, without any downtime and ensuring uninterrupted service to all of our customers.

This project involved not only technical staff but also members of our customer success department and, of course, many users and clients. Therefore, these posts will cover not only technical issues but also customer management and procedures to ensure the quality of Teixo.

‍

1.1 Starting point

Teixo's first lines of code were written in 2008, using Rails 2.3 over Ruby 1.8.3, which was the leading technology of the time. Teixo uses a MySQL database, which despite its large size, has excellent performance.

Over the last twelve years, Teixo has evolved at a rapid pace, adding features and adapting to new infrastructure needs with great success. The main components of Teixo include:

A classic web application with some dynamic content using jQuery but not a front-end/back-end structure.
Many APIs, some for internal use (mobile apps, other Teimas products, etc), some for external use of our customers.
More than 50 background processes running asynchronously in its own dedicated servers. We use the gem DelayedJob to handle this.
About 20 cron jobs that run at different frequencies (hourly, daily, weekly or even monthly).
Those parts are built together on a modular ‘monolith’.
Currently, Teixo runs on AWS infrastructure using two different auto-scaling groups: one for web and API capabilities, and the other for background processes. We also updated Teixo's Ruby version from 1.8.3 to 2.7.3, with the help of a special LTS version of Rails 2.3 with support for Ruby 2.7.3.

However, despite updating Ruby, Teixo faced many problems, such as outdated gems, an obsolete Activerecord API, deployment troubleshooting, etc. This made new developments more complex and error-prone over time.

In 2021, we decided to reduce our technical debt and scheduled a project to ensure the viability of Teixo for at least 10 more years. We decided to update Teixo's Rails version in different steps or phases, one for each major version of Rails:

Rails 3.2
Rails 4.2
Rails 5.2
Rails 6.1
Rails 7.0

At this point, we still weren't sure whether it would be better to split these steps into smaller ones (one for each minor version, i.e., Rails 3.0, later Rails 3.1, and finally 3.2), or do each major change at a time. Although Rails has good documentation on updating apps from one version to the next one, the Rails Guides, this project was still extremely complex.

1.2 Some Teixo metrics

To help understand the complexity of the project, here is some data about Teixo:

Declared Gems 59, Total Gems 134.

1.3 The Plan

By the end of 2021, we began to define "The Plan," which had several key requirements:

We should be able to run both the old and new versions of Teixo simultaneously during each step of the update process.
Development of new features or fixes should continue during the update process.
We needed to improve our automatic test coverage to ensure that the new Teixo versions would meet our quality standards.

1.3.1 Initial analysis

We also began searching for companies that could assist us with this process - a "Guide" for our journey. We found a Philadelphia-based company specializing in updating Rails projects and started working with them in the early phase of checking over Teixo's codebase to prepare a report about the steps and potential problems we might encounter in this project.

After completing the analysis, they proposed a step-by-step update process, one for each minor Rails version (i.e., Rails 3.0, 3.1, 3.2, 4.0, and so on). However, we decided that this approach would take too long to complete, so we opted for larger steps to achieve the update in a reasonable timeframe. Fortunately, we had a strong QA and Customer Success department that helped us test and validate the update steps quickly and reliably, even with significant platform changes.

We knew, and the analysis confirmed, that we needed to prepare for this update process. For several months, we enforced QA on Teixo, developing many new automatic tests on our CI environment and expanding the existing ones. We also strengthened our Customer Success department protocols for checking Teixo's quality.

The analysis also proposed a dual-boot mechanism, having one codebase suitable to boot on two versions of Rails. However, we thought that this approach would be very challenging to handle and would increase the overhead of an already very complex project. Instead, we decided to tackle the project in a more streamlined way, making the necessary code changes on each step of the project.

The report included a summary of the changes needed to be made in Teixo for each Rails step, which was an adapted version of the Rails Guides for updating.

‍

1.3.2 Two much Teixo

We propose to handle the project by simultaneously developing two versions of Teixo at each step of the process. For example, during the first phase, one version will use Rails 2.3, while the other will use Rails 3.2.

In the staging and production environments, both versions should work with the same database schema. Only one version should handle database migrations, but both versions should be compatible with the schema. This requires particular care with migrations, ensuring that both versions of the code have the same migrations, or at the very least, do not delete attributes/columns in only one version.

‍

1.3.3 Double trouble

We needed to incorporate the update process into our regular development workflow, so we made some adjustments to our development process.

In our repository, we had a master branch for the 'old' version of Teixo and an 'edge' branch for the changes required in the current update step.
Normal development continued to push changes to the master branch, while Rails update changes were pushed to the new 'edge' branch.
Each branch had its own configuration in our CI environment (Jenkins) and ran its set of automated tests.
Changes on both branches needed to be reviewed and approved in the same way.

At each step of the update process, there are two phases of work to be completed.

‍

The first phase focuses on making the necessary changes to get the different parts of Teixo running with the new Rails version, and checking for errors in the automated test process on the CI environment, reducing them with each push. These changes are pushed to the edge branch. During the initial pushes, the results of testing on the CI environment are 'ignored' and merged despite having many errors. We can call this phase 'proactive' technical work.
Once those errors have been fixed, and the new version seems stable, the second phase begins. At this point, the technical staff starts their 'reactive' work hand in hand with the Customer Success team (which has greater involvement) to validate and test the new version. This later work in this phase also involves certain customers.

If we had chosen to take smaller steps (minor Rails version changes), the overhead of the second phase would have been excessive, particularly for the Customer Success department.

Meanwhile, the normal Teixo development workflow continued. During this second phase, we had to change our usual procedure in the following aspects:

With each Teixo release, the new code on the master branch should be merged and reviewed again on the edge branch. It's worth mentioning that Teixo frequently has small releases, usually one per week, which helps to keep both code branches close enough.
Developers on the master branch should be careful when committing changes incompatible with the new Rails version. In such cases, the changes should be tagged in the source code, and the merging process should take these special changes into account.

We also modified our process for new Teixo releases to handle this duality.

We used to have one staging environment where the new release was deployed, tested, and validated before the final publication in production. Now, we have two staging environments, one for the master branch and the other for the edge. Both share the same database.
Deployments on staging are made in both environments.
The Customer Success department's testing and validation, prior to the release of a new version, are carried out on both environments.

‍

2. Rails 3.2 (I of II): First steps

In April 2022, we began working on the first and by far the most complex step we would take: updating from Rails 2.3 to 3.2 without any intermediate steps.

2.1 Collaborations

We hired one expert developer from our 'Guide' company to team up with two senior developers from TEIMAS. Unfortunately, this collaboration did not improve the project as expected, due to several obstacles that made collaboration difficult:

Communication between teams was poor, mainly due to timezone issues between Spain and where the external developer was located (with only one hour of overlap). This was a significant issue in day-to-day work.
Teixo has a highly complex business logic that is not easy to understand unless you work in the waste management industry. The external team did not grasp the business logic behind the code, and more guidance from our side could have helped. Alternatively, a fluent communication between the teams would have been beneficial.
The external team was not comfortable with our tools (Monday.com, Gerrit) and workflow. They preferred smaller steps and the dual boot mechanism.
The way of addressing problems was very different. We aimed to solve transversal problems in a transversal way, while their approach was to make small changes each time a problem arose. For example, Teixo heavily used the replace_html function (almost 300 appearances in more than 50 files) that was deprecated in Rails 3.2. Their approach was to fix every instance of this function, while we chose to define our custom replace_html adapted to Rails 3.
Finally, we found that their analysis did not cover all the problems we faced, even several that had a significant impact (and which we thought were obvious). Their experience in updating Rails apps did not manifest during our collaboration.

Ultimately, our collaboration ended on friendly and mutual terms before completing the first step. As a result, we finished the first update without external help.

‍

2.2 Testing, testing and more testing

Before starting work with a new branch for Rails 3.2, we put in a significant effort to increase Teixo's automated test coverage, reaching almost 70%. This work was one of the most critical components for the project's success.

Additionally, we addressed Deprecation Warnings, such as 'ActiveRecord::Base#class_name is deprecated,' and implemented other improvements such as organizing the code more effectively and cleaning up unused parts of the project.

‍

2.3 Technical work

This section covers the various technical changes, problems, and issues we encountered when migrating from Rails 2.3 to 3.2. This work was mostly carried out in the first part of the update step, but some was also done during the 'reactive work' phase to address previously undetected issues and errors that were brought to our attention by the Customer Success department and even some trusted customers.

2.3.1 Bump rails version to 3.2 LTS

2.3.1.1 Gems and more Gems

As previously mentioned, we upgraded to a Rails 3.2 LTS version supported by Makandra, a team of veteran Rails developers and operations engineers, as they define themselves.

We use Bundler and once we’d updated the gemfile to use:

gem 'rails', '~> 3.2.22.27'

we had to fix some Gem issues.

1. Our version of the Mysql2 gem (0.5.3) was incompatible with this version of Rails. As a result, we had to switch to a different version of the Mysql2 gem, which was available from the Makandra repository:

gem 'mysql2', git: 'https://github.com/makandra/mysql2', branch: 'master'

2. We also had to fork Makandra's activerecord gem to modify it to support Mysql gems greater than ours. This involved simply changing one line on lib/active_record/connection_adapters/mysql2_adapter.rb.

‍

require 'active_record/connection_adapters/abstract_mysql_adapter'

‍

gem 'mysql2', '> 0.3.10'

require 'mysql2'

‍

module ActiveRecord

class Base

....

‍

The JRails plugin was deleted and replaced with jquery-rails.
We updated several other gems in a straightforward way.
Some gems required additional testing and troubleshooting, but we did not encounter any significant issues. For instance, we use the Postmark gem, and its behavior had changed when using Postmark's API. As a result, we had to redefine how we used it inside Teixo.

2.3.1.2 Tune the configuration and code

Configuration and several other files required dependencies using the File API, which had changed. Consequently, we had to make changes in several areas, including:

require File.dirname(__FILE__) + '/../config/boot'

require File.expand_path('../../config/boot', __FILE__)

‍

We also had to update several boot and configuration files along with many initializers.

Config: confg.ru for example, was changed on various lines, mainly changed:

run ActionController::Dispatcher.new

run Teixo::Application

‍

Application: We had to build a new config/application.rb with the app configuration (extracted from the config/environment.rb file) declaring Teixo app as:

module Teixo

class Application < Rails::Application

‍

Boot: config/boot.rb was simplified to simply load the Gemfile and bundler.

‍

require 'rubygems'

‍

# Set up gems listed in the Gemfile.

ENV['BUNDLE_GEMFILE'] ||= File.expand_path('../../Gemfile', __FILE__)

require 'bundler/setup' if File.exists?(ENV['BUNDLE_GEMFILE'])

‍

Rakefile: has to define how to load configuration and the Application

‍

require File.expand_path('../config/application', __FILE__)

...

Teixo::Application.load_tasks

‍

Routes: The most time-consuming change was rewriting the config/routes.rb file. Route declaration changed significantly from Rails 2 to 3. Despite having a tool available to translate routes from 2 to 3, our routes.rb file was not very well defined. As a result, we had to manually convert many of the routes to the new format.

It was very handy to try routes helpers from the console (when we got it to work), for example:

‍

irb(main):002:0> app.admin_users_path

=> "/admin/users"

‍

2.3.1.3 Let it run

Despite of the Gems update and configuration changes the project was not able to launch a Rails console nor the Server. We needed to make several more changes.

RAILS_ENV was no longer supported, we had to change it to Rails.env in all the project.

All references to the package ActionController::xxxxxx had changed to ActionDispatch::xxxxx (for example ActionController::Routing::Routes)

All scope declarations must change from ‘named_scope’ to simply ‘scope’. Also, some scopes on superclasses didn’t work well in subclasses so we had to rewrite some of them using ‘scoped’. For example:

named_scope :not_draft, lambda { {:conditions => ["#{table_name}.state != ?", DOCUMENT_STATES[:draft]]} }

Became:

def self.not_draft

scoped(:conditions => ["#{table_name}.state != ?", DOCUMENT_STATES[:draft]])

end

‍

We changed a few attr_accessor_with_default used in Teixo because it was no longer supported. We fixed each case depending on the need. Mostly using attr_writer to declare the field and/or defining a getter method.

Saving without validations was no longer available with a ‘save(false)’ call, now it is save(:validate => false).

Several helper methods used in Haml (and Erb) partials no longer worked using the - operator, we had to change it to the = operator.

- form_tag session_path, :id => 'login_form' do

Changed to

= form_tag session_path, :id => 'login_form' do

The flash method on Rails 2 returned an object that responded to the Hash API, on Rails 3 this is no longer true so we had to call to_h on every use:

if flash.values.empty?

Changed to

‍

f_hash = flash.to_h

if f_hash.values.empty?

‍

The use of @template inside controllers is no longer supported. Instead, it is recommended to use the view_context method (In Rails 3, the new AbstractController was introduced).

‍

Fecha

25/5/23

Categoría

Tecnología

Etiquetas

Software Teixo

Compartir en