How we updated our main product’s core without downtime

Teixo: de Rails 2.3 a Rails 4.2

This is some text inside of a div block.

Este artículo está únicamente disponible en inglés.

We are TEIMAS, one of the few software companies worldwide specialized in digitalizing the waste value chain. We help companies in their transition to the circular economy and decarbonisation in order to protect resources, the environment and people's health. Our software solutions favour the reintegration of waste into the production chain to reduce the consumption of new resources.

One of those solutions is Teixo, it’s a SaaS software designed for Waste Management Companies. Some of the features Teixo has are:

  1. It generates and processes documentation for transfer, treatment and identification.
  2. Controls processes, streamlines waste operations and product offers.
  3. Simplifies procedures ensuring compliance with the regulatory requirements.
  4. It’s connected with all regional waste platforms and the state e-SIR platform. (More than 10 platforms).
  5. Serves both, web users and remote systems through internal and external APIs for mobile apps and our customers ERPs.

Teixo i’s the leading tool in Spain implemented in more than 700 waste management facilities with more than 3000 active users per day.

This series of posts talks about the project of updating our main product, Teixo, without downtime and granting the service to all of our customers. 

This project did not only involved technical staff, but also customer success department people and of course many users and clients. So those posts will not only cover technical issues but also customer management and procedures to grant Teixo’s quality.

4.1 Starting point

The first lines of code in Teixo were written in 2008 using those days leading technology, Rails 2.3 over Ruby 1.8.3. Teixo works with a MySQL database that has a very good performance despite its large size.

Teixo evolved at a very high rate for more than twelve years since those first lines of code, mostly adding features and adapting to new infrastructure needs. And it did it well. 

The main parts of Teixo are:

  1. A classic web application with some dynamic content using jQuery but not a front-end/back-end structure. 
  2. Many APIs, some for internal use (mobile apps, other Teimas products, etc), some for external use of our customers.
  3. More than 50 background processes running asynchronously in its own dedicated servers. We use the gem DelayedJob to handle this.
  4. About 20 cron jobs that run at different frequencies (hourly, daily, weekly or even monthly).

Those parts are built together on a modular ‘monolith’.

Teixo currently runs over AWS infrastructure using two different auto scaling groups, one for web and API capabilities and the other for background processes.

Also, this is an important point, we updated the Ruby version from 1.8.3 to 2.7.3. For doing this update we had the help of a special LTS version of Rails 2.3 (https://railslts.com/) with support for Ruby 2.7.3. 

But despite having an updated Ruby version, Teixo had many problems, like outdated gems, an obsolete Activerecord API, or deployment troubleshooting, etc. Thus new developments became more complex and fault prone with the passing of the years.

In 2021 we wanted to reduce our technical debt so we decided to move forward and schedule a project to grant the viability of Teixo for at least 10 more years. We decided to update Teixo´s Rails version and do it in different steps or phases, one for each mayor version of Rails:

  • Rails 3.2
  • Rails 4.2
  • Rails 5.2
  • Rails 6.1
  • Rails 7.0

At this point we still didn’t knew if it would be better to split those steps on smaller ones (one for each minor version, i.e Rails 3.0, later Rails 3.1 and finally 3.2), or doing each mayor change at a time.

Rails has a good documentation specially about updating apps from one version to the next one, the Rails Guides, but even with this help the project we faced had a great complexity.

4.2 Some Teixo metrics

To help understand the difficulty of the project, here we put some data about Teixo:

Declared Gems   59, Total Gems   134.

4.3 The Plan

By the end of 2021 we began to specify ‘The Plan’. It had some key requirements:

  1. We should be able to run two different versions of Teixo at the same time, the old and the new on each step of the update.
  2. Development process of new features or fixes should go on during the update process. 
  3. We had to improve our automatic test coverage to guarantee that the new Teixo versions would have enough quality. 

4.3.1 Initial analysis

We also started to search for companies that would help us on this process, a ‘Guide’ for our journey. We found one Philadelphia’s company specialized on updating Rails Projects. We started to work with them in a early phase where they would check over Teixo’s codebase and prepare a report about the steps and possible problems we will found in this project.

When the analysis was finished they proposed us a step by step update process, one step for every minor Rails version. That is, Rails 3.0, 3.1, 3.2, 4.0, and so on. This approach would take years to be done so we preferred to do bigger steps in order to achieve the update in a reasonable time. Fortunately we had an ace up the sleeve, a great QA and Customer Success department that would help us to test and validate the update steps in a fast and reliable way (even with big platform changes).

We knew, and the analysis confirmed that we had to prepare this update process. For several months we enforced QA on Teixo, developing a lot of new automatic tests on our CI environment and expanding the existing ones. We also strengthen our Customer Success department protocols for checking Teixo quality.

The analysis also proposed how to deal with the update process, having one codebase suitable to boot on two versions of Rails (dual boot mechanism). This would imply to fill all the codebase with ifs to separate code from the different versions of Rails in the cases that it should be different. Later the ifs should be removed leaving only the modern Rails code. Finally this should be done on each step of the project. This was the company’s usual procedure on this kind of projects. We thought this approach would be very hard to handle and would increase the overhead of an already very complex project.

Finally the report included a summary of the changes needed to be made in Teixo on each Rails step, it was an adapted version of the Rails Guides to updating.

4.3.2 Two much Teixo

We proposed to approach the project handling two versions of Teixo at the same time on each step of the process. I.E., one version with Rails 2.3 and other with Rails 3.2 for the first phase. 

In the staging and production environment they should work with the same database schema. Only one version should handle database migrations but both versions should be ‘compatible’ with the schema. This leads to be especially careful with migrations, having the same migrations on both versions of the code, or at least do no delete attributes/columns in only one version.

4.3.3 Double trouble

The update process should be done along with the normal development workflow so we had to tweak our development process a bit. 

  1. On our repository we had a master branch for the ‘old’ version of Teixo and an ‘edge’ branch for the changes required on the current update step.
  2. Normal development continued pushing changes to master but rails update changes were pushed to the new ‘edge’ branch.
  3. Each branch has its own configuration on our CI environment (Jenkins) and runs its set of automated tests.
  4. Changes on both branches needed to be reviewed and approved the same way. 

On each step of the update process there are two phases of work to be done. 

  • The first one focuses on making the changes needed to get the different parts of Teixo running with the new Rails version and checking errors on the automated test process on the CI environment, reducing them on each push. Those changes are pushed to the edge branch. On the first pushes, the results of testing on the CI environment are ‘ignored’, being merged despite having many errors. We can call this phase as ‘proactive’ technical work.
  • Once those errors have been fixed and the new version seems stable the second phase starts. At this moment technical staff starts his ‘reactive’ work hand by hand with the Customer Success team (that has greater involvement) to validate and test the new version. The later work on this phase also involved certain customers.

If we would have chosen to do smaller steps (minor Rails version changes) the overhead of the second phase would be excessive, specially for the Customer Success department.

Meanwhile the normal Teixo’s development workflow continued, so on this second phase we had to change our usual procedure on the next aspects:

  1. With each Teixo release the new code on the master branch should be merged and reviewed again on the edge branch. It’s worth mentioning that Teixo frequently has small releases, usually one per week, this helped to maintain both code branches close enough.
  2. Developers on master branch should be careful when committing changes incompatible for the new Rails version. On this case it should be tagged on the source code and the merging process should take these special changes into account.

Our process for new Teixo releases was also modified in order to handle this duality. 

  1. We had one staging environment where the new release is deployed, tested and validated before the final publish in production. Now we have two staging environments, one for the master branch and other for the edge. Both share the same database.
  2. Deployments on staging are made in both environments.
  3. Customer Success department testing and validation, previous to the release of a new version, are made on both environments.

5 Rails 3.2

On April 2022 we started to work on the first and (by far) most complex of the steps we would make: Updating from Rails 2.3 to 3.2 without intermediate steps. 

5.1 Collaborations

We hired one expert developer of our ‘Guide’ company to team up with two more Teimas senior developers. But this collaboration didn’t help the project. Many obstacles made collaboration difficult:

  • Communication between teams was not good, mainly for timezone issues between Spain and where the external developer was (only one hour of coincidence). On the day to day work this was a big issue.
  • Teixo has a very complex business logic not easy to understand, unless you work in the waste management area. The external team didn’t get the business logic behind the code. Surely more teaching from our side would have helped. Or at least a fluent communication between teams.
  • The external team was not comfortable with our tools (Monday.com, Gerrit) and workflow (they preferred smaller steps and the dual boot mechanism).
  • The way to address the problems were very different. We bet on solve transversal problems in a transversal way, but their approach was to make small changes on each appearance of the problem. For example Teixo made heavy use of replace_html function (almost 300 appearances in more than 50 files) that was deprecated in Rails 3.2. Their approach was to fix every use of this function. We choose to define our custom replace_html adapted to Rails 3.
  • Finally we found that their analysis didn’t cover all the problems we finally faced, even several that had a great impact (and we thought that they obvious). Their experience in updating Rails apps didn’t show up during our collaboration.

Finally our collaboration finished in a friendly and mutual way before the first step end. So we finished the first update without external help.

5.2 Testing, testing and more testing

Before starting to work with a new branch for Rails 3.2 we made a great effort on increasing Teixo’s automated test coverage. We reached nearly the 70%. This work is one of the most relevant for the success of the project.

Also we worked on fixing some Deprecation Warnings like ‘ActiveRecord::Base#class_name is deprecated’ and other improvements like improving code organization and cleaning  unused parts of the project.

5.3 Technical work

This part details the different technical changes, problems and issues we faced when migrating from Rails 2.3 to 3.2. This work was mostly made on the first part of this update step but some was also made on the ‘reactive work’ phase solving previously undetected issues and errors that the work of Customer Success department and even any trusted customer brought out.

5.3.1 Bump rails version to 3.2 LTS

5.3.1.1 Gems and more Gems

As it was mention we upgraded to a Rails 3.2 LTS version supported by Makandra, as they define themselves, a team of veteran Rails developers and operations engineers.

We use Bundler and once we’d updated the gemfile to use:

gem 'rails', '~> 3.2.22.27'

we had to fix some Gem issues.

  1. Our version of Mysql2 gem (0.5.3) was not compatible with this version of Rails. We’d to change our Mysql2 gem to use one of the Makandra repository:

gem 'mysql2', git: 'https://github.com/makandra/mysql2', branch: 'master'

  1. It was also needed to fork Makandra’s activerecord gem to simply tell the gem that Mysql gems Greater than ours were also supported. Simply changing one line on lib/active_record/connection_adapters/mysql2_adapter.rb

require 'active_record/connection_adapters/abstract_mysql_adapter'

gem 'mysql2', '> 0.3.10'

require 'mysql2'

module ActiveRecord

  class Base

....

  1. The JRails plugin was deleted and replaced with jquery-rails.
  2. We updated several other gems in a straightforward way. 
  3. Others required some test error process but there were not relevant issues. For example we use the Postmark gem. It’s behaviour when using Postmark’s API had changed, so we had to redefine how we used it inside Teixo.

5.3.1.2 Tune the configuration and code

Configuration and many other files required dependencies using File API that has changed, so we must make changes on some places, for example:

require File.dirname(__FILE__) + '/../config/boot'

To

require File.expand_path('../../config/boot',  __FILE__)

We also had to update several boot and configuration files along with many initializers.

Config: confg.ru for example, was changed on various lines, mainly changed:

run ActionController::Dispatcher.new

to

run Teixo::Application

Application: We had to build a new config/application.rb with the app configuration (extracted from the config/environment.rb file) declaring Teixo app as:

module Teixo

class Application < Rails::Application

Boot: config/boot.rb was simplified to simply load the Gemfile and bundler.

require 'rubygems'

# Set up gems listed in the Gemfile.

ENV['BUNDLE_GEMFILE'] ||= File.expand_path('../../Gemfile', __FILE__)

require 'bundler/setup' if File.exists?(ENV['BUNDLE_GEMFILE'])

Rakefile: has to define how to load configuration and the Application

require File.expand_path('../config/application', __FILE__)

...

Teixo::Application.load_tasks

Routes: The most time consuming change was to rewrite the config/routes.rb file. Route declaration changed a lot from Rails 2 to 3. Despite existing tool for translating routes from 2 to 3 our routes.rb file was not very well defined so we have to manually port many of the routes to the new format.

It was very handy to try routes helpers from the console (when we got it to work), for example:

irb(main):002:0> app.admin_users_path

=> "/admin/users"

5.3.1.3 Let it run

Despite of the Gems update and configuration changes the project was not able to launch a Rails console nor the Server. We needed to make several more changes.

RAILS_ENV was no longer supported, we had to change it to Rails.env in all the project.

All references to the package ActionController::xxxxxx had changed to ActionDispatch::xxxxx (for example ActionController::Routing::Routes)

All scope declarations must change from ‘named_scope’ to simply ‘scope’. Also, some scopes on superclasses didn’t work well in subclases so we had to rewrite some of them using ‘scoped’. For example

named_scope :not_draft, lambda { {:conditions => ["#{table_name}.state != ?", DOCUMENT_STATES[:draft]]} }

Became:

def self.not_draft

scoped(:conditions => ["#{table_name}.state != ?", DOCUMENT_STATES[:draft]])

  end

We changed a few ​​attr_accessor_with_default used in Teixo because it was no longer supported. We fixed each case depending on the need. Mostly using attr_writer to declare the field and/or defining a getter method.

Saving without validations was no longer available with a ‘save(false)’ call, now it is save(:validate => false).

Several helper methods used in Haml (and Erb) partials no longer worked using the - operator, we had to change it to the = operator.

- form_tag session_path, :id => 'login_form' do

Changed to

= form_tag session_path, :id => 'login_form' do

The flash method on Rails 2 returned an object that responded to the Hash API, on Rails 3 this is no longer true so we had to call to_h on every use:

if flash.values.empty?

Changed to

f_hash = flash.to_h

if f_hash.values.empty?

The use of @template is no longer supported inside controllers. Recommendation is to use the view_context method instead of @template (In Rails 3 the new AbstractController was introduced).

5.3.2 Other changes and fixes

Once the app was running we continued to work making many changes, most of them on the first phase of this update while trying to get to 0 errors on our CI environment. But many other problems were detected and fixed when the app was released on our staging environment, thanks to the Customer Experience department.

Sometimes we faced problems that would require a lot of work to fix, so we first tried to workaround it with Monkey Patchings, and scheduled the right fix work in the future so we could bypass this error and go on with the process. Later we try to do the right fix but not after assessing if it is worth the work (sometimes it wasn’t).

Many of the problems we faced where those ones:

html_safe?

Teixo has a lot of helpers that return strings with embedded html, css, or even js. Since the arrival of xss protection with Rails 3 the use of those helpers and other variables on the Erb and Haml templates returned escaped html making Teixo unusable. The right fix would have been to review all the partials and use .html_safe on every ‘unsafe’ string use. This approach was impossible due the size of the project. We knew that our helpers and rest of code on partials were safe (we filter the user input) so we workaround this with a Monkey Patch:

We defined a module CustomHtmlSafe:

module CustomHtmlSafe

  def html_safe?

true

  end

end

And monkey patched several classes. ActiveSupport::SafeBuffer also had to overwrite to_s to avoid rendering problems:

class ActionView::OutputBuffer

  include CustomHtmlSafe

end

class ActiveSupport::SafeBuffer

  include CustomHtmlSafe

  def to_s

"#{self}"

  end

end

class ActionView::SafeBuffer

  include CustomHtmlSafe

end

class String

  include CustomHtmlSafe

end

form_for, remote_form_for and link_to_remote

The method form_for no longer received a symbol as the first param, whe had to change all the form_for from this:

=form_for :document, @document, :url => url do |f|

To this

=form_for @document, :url => url do |f|

Also, on the old form_for the symbol of first param should be added as an option with an :as label (when the name of the form param doesn’t match the field of the object).

=form_for @document, :as => :document, :url => url do |f|

Teixo had a lot of partials which used remote_form_for that no longer exists. The Rails guide recommends to use form_for with the :remote option instead. We have to rewrite all those remote_forms.

Routes helper method changes

Several route helpers methods changed from Rails 2 so we had to review and rewrite them, specially in create and update methods:

update_api_v2_devices_waste_collection_path(wc.id)

To:

api_v2_devices_waste_collection_path(wc.id),

Also on remote routes helper methods like create_xxxx_remote_path or update_xxxx_remote_path.

After initialize

The after_initialize callbacks on Rails 2 were declared as a method, on Rails 3 this changed to the standard macro style:

def after_initialize

self.effective_company_type ||= DEFAULT_COMPANY_TYPE

end

To:

after_initialize do |init_object|

init_object.effective_company_type ||= DEFAULT_COMPANY_TYPE

end

Replace_html and render :update

This was one of the most complex changes we had to make. The replace_html method is no longer supported on Rails 3. Most of the dynamic behavior in Teixo’s frontend comes from this Rails feature so we cannot go without it. We can’t afford changing Teixo’s frontend behavior at this moment (it will be a huge project in a later phase of the update) so we implemented our own replace_html based on jQuery (the js library we already had in Teixo).

Our definition was included in jquery_helper.rb and looks like this:

  def replace_html(element_id, html)

insert_html(:html, element_id, html)

  end

  def insert_html(position, element_id, html)

insertion = position.to_s.downcase

insertion = 'append' if insertion == 'bottom'

insertion = 'prepend' if insertion == 'top'

    # Adds immediate timeout to execute complete and success callbacks

%Q(

   setTimeout(function () {

     jQuery('##{element_id}').#{insertion}('#{escape_javascript(html)}');

     $(document).trigger('ajax:replaced', jQuery('##{element_id}'));

   });

)

  end

So replace_html receives the same parameters as before, a dom element id and the html to change the content of the dom element with. But we should also change how this response was sent to the browser, no more render :update were useful, we have to develop a new way, our generate_js_response that simply renders the js passed as parameter:

def generate_js_response(&block)

  render "shared/js_response.js", :locals => {:js_content => block}

end

And the shared/js_response.js

<% content = [] %>

<% self.instance_exec(content, &js_content)%>

<%= content.join %>

So, finally we changed our uses of replace_html from this:

   render :update do |page|

       page.replace_html('link_add_bank_account', render(:partial => 'remote_form'))

   end

To this:

   generate_js_response do |page|

       page << replace_html('link_add_bank_account', render(:partial => 'remote_form'))

   end

Only render :update has to be changed, not the replace_html, so the work to change the 300 occurrences of replace_html was more affordable.

Mailers

Rails 3 made several changes on mailing system so we had to make some relevant changes, and not all of them were on the documentation:

  • No more Mailer.deliver_xxxxx() methods. We should change to Mailer.xxxx().deliver
  • The way to compose and use partials and templates changed. We had to fix issues related to:
  • How to name partials (in our case from .text.html.haml to .html.haml) used in mailing composition.
  • All the locals used in a mailer partial should be passed, even they are nil.
  • Composing emails with attachments changed even more. It was needed to attach de documents and define the email as one with mixed content.

  def set_multipart_structure(mixed_mail)

if attachments.any?

     # Set the message content-type to be 'multipart/mixed'

     mixed_mail.content_type 'multipart/mixed'

     mixed_mail.header['content-type'].parameters[:boundary] = mixed_mail.body.boundary

     # Set Content-Disposition to nil to remove it - fixes iOS attachment viewing

     mixed_mail.content_disposition = nil

end

  end

Clone and dup

Rails clone method does not exist in Rails 3 (it comes back in Rails 4) so we changed call to it with dup method.

errors.add_to_base

Errors objects no longer support add_to_base. We had to change these calls to errors.add :base.

Boolean params

Params with boolean values inside where magically managed in Rails 2, The string ‘true’ was automatically parsed to true, and ‘false’ to false. With the arrival of Rails 3 this behaviour changed so  we had to define a parser and use it on the different params needed:

def parse_boolean(str_bool)  

ActiveRecord::ConnectionAdapters::Column.value_to_boolean(str_bool)

end

Another option would be to review al the forms using boolean values but this was an easier way.

Link_to_remote

In Rails 2, it was possible to use “link_to_remote ... :update => 'id'” for replacing the content of $('#id') automatically. It’s not possible within Rails 3 so we had to adapt our wrapper of link_to_remote on actions_link_helper.rb:

  def link_to_remote(name, options = {}, html_options = nil)

.... // Custom implementation 

     //adding the relevant part for making the html replacement work

data_replace = options.delete(:update)

html_options = html_options.merge(:"data-replace" => "##{data_replace}") if data_replace.present?

data_complete = options.delete(:complete)

html_options = html_options.merge(:"data-complete" => "#{data_complete}") if data_complete.present?

data_before = options.delete(:before)

html_options = html_options.merge(:"data-before" => "#{data_before}") if data_before.present?

data_error = options.delete(:error)

html_options = html_options.merge(:"data-error" => "#{data_error}") if data_error.present?

link_to(name, path, options.merge(:remote => true).merge(html_options))

  end

and define this helper in our application.js to make it work:

$('[data-remote][data-replace]')

  .data('type', 'html')

  .live('ajax:success', function(event, data) {

var $this = $(this);

$($this.data('replace')).html(data);

$this.trigger('ajax:replaced');

  });

Finder_sql

Relations with a finder_sql had to be changed by the syntax change to a proc on Rails 3. 

From:

has_many :formations, :class_name => "Formation",

  :finder_sql => %q(SELECT DISTINCT ...)

To:

has_many :formations, :class_name => "Formation",

  :finder_sql => proc {"SELECT DISTINCT ...”}

attributes=

The behavior of this method changed from Rails 2 to 3. We had a several uses of this method where the hash used had virtual params or non existing attributes, this raises an error on Rails 3. So we had to monkeypatch this method, for filtering the received hash and allow only existing attributes. The monkeypatch looks like this:

 class ActiveRecord::Base

   alias_method :super_attributes=, :attributes=

   def attributes=(hash = {})

  hash ||= {}

  self.super_attributes = hash.select{|k,v| self.class.column_names.member?(k.to_s) || k.to_s.match(/_attributes\z/) || self.respond_to?(:"{k}=")}

   end

 end

Submit_to_remote

The helper method submit_to_remote is no longer available on Rails 3. So we had to define one own in application_helper.rb

  def submit_to_remote(name, value, options = {})

html_options = options.delete(:html) || {}

submit_tag value, options.merge(html_options).merge(:id => name)

  end

In some cases this was not enough so we had to replace it with a specific link_to_remote.

Errors.full_message

Teixo use the errors.full_message to display the problems a page form has (required fields, wrong format, length issues, etc.). But the behavior has changed in Rails 3 and the nested objects errors are not included in the full_message. So we had to monkeypatch it, as you can see:

class ActiveModel::Errors

  alias_method :old_full_message, :full_message

  def full_message(attribute, message)

if (splitted_attribute = attribute.to_s.split(".")).count > 1

   translated_attribute = if @base.send(splitted_attribute.first).respond_to?(:any?)

     @base.send(splitted_attribute.first).first.class.human_attribute_name(splitted_attribute.second)

   else

     @base.send(splitted_attribute.first).class.human_attribute_name(splitted_attribute.second)

   end

   old_full_message(translated_attribute, message)

else

   old_full_message(attribute, message)

end

  end

end

Caching views

The use of cache in views changed slightly and we had to struggle a lot to find out what was happening. We had some partials which used cache defined in a special controller, with code like this (action_cache_key is a method to get current partial cache key):

Rails.cache.fetch(action_cache_key(opts)) do

render opts[:action]

end

If the partial is not cached the result was not rendered, but the second time we accessed this page/partial, it was cached fine and was rendered fine.

Finally we notice that we have to return always a String inside the Rails.cache.fetch block:

Rails.cache.fetch(action_cache_key(opts)) do

  block.call if block_given?

  render(opts[:action]).join("")

end

readonly(false)

Many object relations in Teixo were loaded with attributes chaining and later updated somehow. With the update to Rails 3 those loads were by default marked as readonly, So further attempts to update those related objects were failing. We had to manually check those relations and mark these loads as readonly(false). For example:

outgoing_line.outgoing.update_me

Changed to

outgoing_line.outgoing(:readonly => false).update_me

Reload

Reload method if called on a deleted object on Rails 2 simply returned nil, on Rails 3 it raises an Exception. We had to review some callbacks that did reload and failed when runned after a delete.



render_optional_error_file

This method no longer exist so whe changed part of the error control to use config/routes.rb instead of this mechanism.

Flash errors

The behavior of flash messages feature had changed between Rails 2.3 and 3.2 had  changed. The messages are no longer available through redirects so we had to do some flash.keep on certain callbacks and filters.

Log_error

Rails 2 has a default mechanism to handle errors on controllers. Whenever an error is raised Rails 2 controllers called to a method named log_error. Since Rails 3 that is no longer true. We had to configure on our base controller a explicit call on this method whenever a error is raised:

class ApplicationController < ActionController::Base

  rescue_from StandardError, with: :log_error

Time.zone.parse

In some places throughout the app (mostly on reports) users can choose to filter data by dates, usually for full month length, so we used Time.zone.parse to parse partial date param strings to get data time boundaries. So users who wanted a report of certain data for the month of July sent params like date1_str: "/07/2022" and date2_str: "/08/2022". In Rails 2 it worked like this:

> date1 = Time.zone.parse("/07/2022")

Fri, 01 Jul 2022 00:00:00 CEST +02:00

But when we changed to Rails 3 the behavior was:

> date1 = Time.zone.parse("/07/2022")

Sun, 31 Jul 2022 00:00:00 CEST +02:00

So un Rails 3 users had data for the month of august. We fixed it adding a call to beginning_of_month when needed.

Respond_to

In Rails 3, respond_to works differently than Rails 2 so we had to review and test different occurrences. 

Also this behavior changes when no Accept header is attached to the request and this was affecting several clients. In Rails 2 default response type in this case was the same as the content-type on the request. We had to implement a before_filter on API controllers to add a default Accept header if none was declared, so that Teixo will work as in Rails 2.

  def add_accept_header_if_necesary

if request.headers['HTTP_ACCEPT'].blank?

     Rails.logger.info("ApplicationController: Accept header empty for #{request.host}/#{request.path}")

       if request.headers['CONTENT_TYPE'].present?

           new_format = Mime::Type.lookup(request.headers['CONTENT_TYPE'])

     if new_format.present?

       request.format = new_format.ref

     end

     end

end

  end

Render with a proc {}

In Rails 3.2 it is not possible to use a render text with a proc as an argument. Fortunately we had only a few uses of this behavior.

Disabled form fields

On Rails 2 disabled form fields act as readonly fields, so data is sent to the server when the form is submitted. From Rails 3 onwards disabled fields are not sent to the server. So we had to review those disabled fields and mark it as readonly instead, or keep them disabled, depending on the case.

BigDecimal gem

We had to add the gem BigDecimal (to continue using this type). We also had to keep it in a compatible version with Rails 3  because it uses  BigDecimal.new for initializing attributes of this class and recent versions of the gem do not support this behavior.

5.3.3 Preparing production environment

5.3.3.1 Servers and more servers

The upgrade to Rails 3 also came with the need to change the servers we were using with Rails 2. We had to configure and deploy new servers with an updated OS and installed libraries. Also update all the scripts and be aware of the problems that came from having two staging and two production environments with different configurations.

Nevertheless the most relevant change came from switching the server from Unicorn to Puma. Despite it is not necessary to use Puma with Rails 3 we decided to go on with this change in order to advance this point for the next Rails upgrade we will face (Rails 4.2).

This was the most headaching part because we face a loss of performance that we do not understood at the beginning. With Rails 2 we configured several Unicorn processes on each server, depending on the CPU and RAM of the server. With Puma we wanted to change this behavior and take advantage of Puma’s threads. So initially we configured only one Puma process with several threads on every frontend server,, so our puma.rb looked like this:

  puts "1 workers and 7 threads"

  workers 1

  # Min and Max threads per worker

  threads 1, 7

At this point we didn’t knew that Rails 3.2 is not multithreading. So Puma with 1 worker and 7 threads is the same than 1 worker and 1 thread. With this configuration each server could only dispatch one request at a time. The other requests were waiting in Puma’s queue. We notice it thanks to New Relic’s monitoring tool.

When we noticed it we changed our Puma configuration to this:

  puts "5 workers and 1 threads"

  workers 5

  # Min and Max threads per worker

  threads 1, 1

And performance returned to normal levels. We also had to fine tuning memory usage and the number of processes. Finally we adopted a gem called puma_worker_killer to keep an old behavior we had with Unicorn using a gem called unicorn_worker_killer to avoid memory problems. Those problems seem to have gone on Rails 3 so we plan to remove puma_worker_killer in the future.

5.3.3.2 Dealing with Stable and Edge channels together

At this point we had two Teixos in the Staging environment, one on Rails 2.3 (Stable) and other on Rails 3.2 (Edge). Both working with the same database and with the same Memcached server. Every version had its own set of DelayedJob workers but they all worked against the same database searching for Jobs to run. Also user sessions were stored on the database and all those things together led us to some new problems we had to deal with.

Sessions

User sessions were stored on the database for both channels. This led us to deserialization issues when a user that had a stored session on Edge (Rails 3.2) attempted to use Stable version (Rails 2.3) because Rails 2.3 didn't knew how to deserialize Rails 3 objects. There is no unique solution for this, it depends on the date attached to the session and how deserialization works for each one. In our case we simply added this on config/initializers/a_config.rb 

ActionController::ParamsHashWithIndifferentAccess = ActionDispatch::Http::ParamsHashWithIndifferentAccess

So Rails 3 could unmarshal Rails 2 objects of this class ParamsHashWithIndifferentAccess and vice versa.

And defining a SessionStore initializer (config/initializers/session_store.rb) to deal with FlashMessages stored on the Session. 

module ActionController

  module Flash

class FlashHash < Hash

       def method_missing(m, *a, &b)

     end

end

  end

end

Cache problems

Another similar problem we faced was to deal with deserialization issues on Rails Cache. On this case we changed Teixo to add an environment variable on both channels Stable and Edge. This variable was used on the Rails 3.2 version of Teixo to be appended at the beginning of each cache key use. Doing so makes cache entries for Rails 3 different for Rails 2 (keys are similar but Rails 3 keys has a prefix), so Stable and Edge channels do not share cache entries. 

Be aware that this can led to some inconsistencies between environments. For example a partial of an object show cached on Stable is not anulled when this object is modified on Edge and vice versa. So be conscious of cache problems if users work on both environments at the same time.

DelayedJobs, workers and offsets

As it was said previously Teixo uses the DelayedJob gem. It defines tasks as different job classes extending  Delayed::Backend::ActiveRecord::Job. Also we simulate different Job categories grouping Job types by priorities. I.E.: priority 0 to 4 means job category A, priorities 5 to 9 means category B, and so on. In the production environment we have different number of Workers to run those Jobs depending on the category: Two workers for category A, one for category B, etc.

When we planned to have two channels we needed a way to ensure that workers on Stable will run only Jobs of the Rails 2 version and the same with Rails 3.

We solved it with a priority offset. Depending on the environment we have a DelayedJob offset parameter. This parameter was used during Job creation and added to the base priority of the job. On the Stable environment the offset was 0, on Edge it was 100. So jobs on Stable had priorities between 0 and 99, and Edge jobs had priorities between 100 and 199. We needed to adapt Workers initialization to add this offset when launching.

Worker for Jobs with A category Stable was launched like this:

/script/delayed_job -n 2 --min-priority 0 --max-priority 4 run

Same worker on Edge

/script/delayed_job -n 2 --min-priority 100 --max-priority 104 run

5.4 QA and Customer Success department

QA has a huge impact in the migration process ensuring that automated testing covers most of the features of Teixo. But not all the features and user workflows are covered by automated testing. Some features, procedures and/or behaviors are complex and specific for certain customers and thus they are not test covered. 

This is where Customer success department comes to play, ensuring that all those specific points are working fine, with a previously defined Test plan. They did several manual test validations, following the test plan, detecting various issues, and most valuable, sources of issues even in non tested parts of the application. All those testing was made in our staging environment on the edge channel.

5.5 Stepping into production

When all the (known) issues were fixed, we deployed our new brand Teixo in production, on the Edge channel, accessible via specific url. At this point we had two different versions of Teixo working with the same database instance: 

We started working on this edge version internally for a few days. 

Meanwhile Customer Success department started to talk to a selected group of customers asking them if they would like to try the new Teixo Release before it was fully open to every customer. 

The advantages for them were that if there were any problems with their workflows on Teixo we could fix these issues very soon. Many customers agreed and we started to allow them access to the edge version. Each week more customers started to work on the new version until we had enough customers working on the platform.

This double production environment led us to an overhead work as explained early (section Double Trouble) but it was worth the effort.

When we considered the edge version was stable enough the final step was planned. It would require several stages:

  • Duplicate our edge environment in production creating a new one now known as stable, and redirect traffic from the stable url to the new environment.
  • Merge the edge branch code to the master branch.
  • This new Stable environment would be running code from the master branch.

When everything was ready and the date arrived the change was relatively easy. We still had to face some issues when all the customers started using the new Teixo versión but they were easily handled and in a few days the situation was stable, the more complex problems were already solved.

6 Learned lessons

  • Nobody can handle your code better than your team. External teams may help and guide but the best developers you will find for a project like this are at your home
  • If you team up with external help, teach them how your product works and why it behaves as it does. Be sure that the external team also understands your workflow, and tools, and last but not least have a fluent communication with them.
  • We’ve made the right choice using two branches of code instead of the dual boot mechanism and making major Rails steps instead of minor ones.
  • Expect more problems than the known ones, many more. Especially if your project has grown away from standard Gems and Libraries.
  • Updating the core of your app would be a good moment for doing cleaning and tidying code on your app beyond the changes needed for updating Rails. But it's worth not to be too ambitious because big changes will lead to increase a lot the already existing entropy on the project. On our first step to Rails 3.2 we also had to change all our AWS servers and this brought us many problems added to those related to the rails update. Don’t try to enhance everything, take notes for future improvement tasks. Do this cleaning and tidying before or after the update.
  • The non technical part of the update process is as important to success as the technical one. Project testing, validation and customer management is key.
  • Don’t be afraid of Monkeypatchings or workarounds. Control them and try to fix them in a later phase of the project.

Fecha
25/5/23
Categoría
Tecnología
Etiquetas
Compartir en
NOTICIAS

Suscríbete a la newsletter

¿Quieres recibir nuestras noticias en tu bandeja de entrada?