Fix Psych loading errors on wisper-sidekiq

One of my favorite rubygems is Wisper, a simple library that lets you add pub/sub style broadcasting and listeners to your app. (Been a fan since it came out in 2014, almost ten years ago!)

I tried to use wisper again recently, specifically with the wisper-sidekiq companion gem, which allows your subscribers to execute asynchronously as Sidekiq jobs.

Unfortunately, I immediately ran into an issue with Psych complaining about my parameters (ActiveRecord model instances) not being allowed by the YAML loading subsystem. If you're running into this same kind of issue, you'll know because you get exceptions that look like Tried to load unspecified class: Account (Psych::DisallowedClass)

Sidestepping the question of whether you should stick to only sending primitive object parameters (strings, integers, etc) as arguments to Sidekiq jobs, here is the monkeypatch solution to solving the problem, tested with wisper-sidekiq version 1.3.0.

Chuck this override into an initializer file.

module Wisper
  class SidekiqBroadcaster
    class Worker
      include ::Sidekiq::Worker

      def perform(yml)
        (subscriber, event, args) = ::YAML.unsafe_load(yml)
        subscriber.public_send(event, *args)
      end
    end
  end
end

The fix is the replacement of YAML.load with YAML.unsafe_load on line 16.

But Obie, isn't this dangerous? No. Objects passed in broadcast events are almost certainly not coming from the outside world in any way, shape, or form, so the reasons that you would typically be interested in blocking YAML loading from processing arbitrary objects do not apply.

But Obie, if your queues back up, won't you have stale data sitting in Redis as parameters to your jobs? For my particular use case this is not a concern, but for yours it might be. Either way it's not the job of the sidekiq-wisper library to enforce this constraint... and indeed, since the library predates the safety additions to YAML.load I'm not even sure that the author intended for the constraint to exist.

Now what would really be cool, and I wish I had time to implement this myself, is if sidekiq-wisper would automatically turn activerecord params into globalid identifiers and query them for you on the consumer side, the way that sidekiq does. Somebody should definitely implement something like that!

Quick Ruby client for Marqo (VectorDB)

Marqo is a cool vector DB that lets you store facts and then query them later with natural language. You can install Marqo using Docker, which I managed to do today on my M1 Mac. Just make sure you stop whatever ElasticSearch or OpenSearch instances you may already have running on your machine first, since Marqo wants to use its own.

Once you have it running, you can use the following Ruby class to access it in your Rails project.

require 'httparty'

class Marqo
  include HTTParty

  base_uri 'http://localhost:8882'

  def initialize(auth = { username: 'admin', password: 'admin' })
    @auth = auth
  end

  def store(index_name, name, detail)
    options = {
      headers: { 'Content-Type' => 'application/json' },
      body: [{name: name, detail: detail}].to_json
    }
    self.class.post("/indexes/#{index_name}/documents", options)
  end

  def search(index_name, query)
    options = {
      basic_auth: @auth,
      headers: { 'Content-Type' => 'application/json' },
      body: { q: query }.to_json
    }
    self.class.post("/indexes/#{index_name}/search", options)
  end

  def self.client
    @client ||= new
  end
end

The data from my personal TIL site is now migrated. Here's how I handled URL redirects in Cloudflare

To set up a subdomain redirect with Cloudflare while preserving the request path and parameters, you can use Page Rules.

In the Cloudflare dashboard for your domain, click on the Page Rules link in the Rules section of the main navigation menu on the left of the page.

Click on the Create Page Rule button to start configuring the redirect.

In the If the URL matches field, enter the pattern for the URLs you want to redirect, using a wildcard (*) to capture the path and query string. In this cae, the pattern is:

til.obiefernandez.com/*

Now click on Add a Setting and select Forwarding URL from the dropdown menu.

Select 301 - Permanent Redirect from the Status Code dropdown.

In the Enter destination URL field, enter the URL where you want to redirect your requests to, using the $1 placeholder to preserve the query string.

For example: https://til.magmalabs.io/$2

Click on Save and Deploy to create the Page Rule and enable the redirect.

CSS Container Queries

I've always seen responsive elements styled with CSS media queries. These often need more media queries to make them look good on different page sections. There's a way to skip the headache.

Use of container queries to apply CSS styles with the element's container size in mind instead of the viewport size:

/* Create a container context */
.container {
  container-type: inline-size;
}

/* Default font size */
.card h2 {
  font-size: 1em;
}

/* Updated font size when the container is larger than 500px */
@container (min-width: 500px) {
  .card h2 {
    font-size: 2em;
    color: gray;
  }
}
<div class="container">
  <div class="card">
    <h2>Card title</h2>
  </div>
</div>

Example

Separate health check endpoint using puma

Puma offers a way to query its internal stats by enabling a controll app in a separate port, this can be useful when we need to know if the app is alive, this is different than normal health check endpoints because it does not get processed by rails at all.

To enable this functionality, all you need to do is to add this line in your puma.rb file:

activate_control_app 'tcp://0.0.0.0:9293', { no_token: true }

It will start a second web server in the port 9293 that can be queried by monitoring tools or even balancer healthcheck.

ecruz@Edwins-MBP % curl 'http://127.0.0.1:9293/stats'
{"started_at":"2021-10-15T21:39:55Z","workers":2,"phase":0,"booted_workers":2,"old_workers":0,"worker_status":[{"started_at":"2021-10-15T21:39:55Z","pid":44969,"index":0,"phase":0,"booted":true,"last_checkin":"2021-10-15T21:40:05Z","last_status":{"backlog":0,"running":5,"pool_capacity":5,"max_threads":5,"requests_count":0}},{"started_at":"2021-10-15T21:39:55Z","pid":44970,"index":1,"phase":0,"booted":true,"last_checkin":"2021-10-15T21:40:05Z","last_status":{"backlog":0,"running":5,"pool_capacity":5,"max_threads":5,"requests_count":0}}]}%
ecruz@Edwins-MBP %

Check the documentation for more options/usages

Eager load rails associations with nested scopes

It is common to apply some extra scopes when fetching AR relationships, for examples, if we have countries and states, we might want all the countries starting with the letter A and all their states that starts with the letter B, this will automatically create a n+1 query problem since it nees to iterate over each country and fetch all states, but, Rails provides a way to eager load these associations easily:

states_scope = State.where("name ilike 'b%'")
countries = Country.where("name ilike 'a%'")
# This is the magic
ActiveRecord::Associations::Preloader.new.preload(countries, :states, states_scope)

# Now you can invoke coutries.each(:states) and it wont cause queries N+1
countries.map {|country| { country.id => country.states.size }

Normally, you would have to define another relationship in order to eager load the association, but it is not needed using this approach:

class Country < AR::Base
  has_many :states
  has_many :states_starting_with_b, -> { where("name ilike 'b%'") }, foreign_key: :state_id, class_name: "State"
end

# Then
Country.includes(:states_starting_with_b).where("name ilike 'a%'")

But this approach does not scale, it requires to define tons of relationships

How to understand the SQL behind an ActiveRecord Query

Quick example, to see and understand better how an ActiveRecord query using scopes or methods in the model works at the SQL level and how to run it, all in the same example

Ruby version

Let say you have a database with users and you have some scopes to filter by non-demo, active, and completed providers with a specific Role (CanCanCan behind the scenes) :

User.providers.real.active.completed

Let's look into the SQL version:

If we run this command

irb(main):007:0> User.providers.real.active.completed.to_sql

We would get this:

=> "SELECT \"users\".* FROM \"users\" INNER JOIN \"users_roles\" ON \"users_roles\".\"user_id\" = \"users\".\"id\" INNER JOIN \"roles\" ON \"roles\".\"id\" = \"users_roles\".\"role_id\" WHERE (roles.name = 'Provider') AND \"users\".\"demo_account\" = 'f' AND (\"users\".\"organization_id\" NOT IN (SELECT \"organizations\".\"id\" FROM \"organizations\" WHERE \"organizations\".\"demo_account\" = 't' ORDER BY \"organizations\".\"name\" ASC)) AND \"users\".\"is_active\" = 't' AND \"users\".\"signup_state\" = 'Completed'"

Then we just need to replace the invalid characters such as \ and translate that into a SQL version like this:

SUMMARY_PROVIDERS_REPORT_SQL = <<-SQL
WITH real_providers AS
  ( SELECT users.* FROM users
    INNER JOIN users_roles ON users_roles.user_id = users.id
    INNER JOIN roles ON roles.id = users_roles.role_id
    WHERE (roles.name = 'Provider') AND users.demo_account = 'f'
      AND (users.organization_id NOT IN (
        SELECT organizations.id FROM organizations WHERE organizations.demo_account = 't' ORDER BY organizations.name ASC)
      )
      AND users.is_active = 't' AND users.signup_state = 'Completed')
select real_providers.id, real_providers.name from real_providers
SQL

In order to run it in the Rails console for example:

report_results = ActiveRecord::Base.connection.execute(SUMMARY_PROVIDERS_REPORT_SQL)
report_results.entries

That will give you the id and name of all those valid providers

Using Upsert with Rails

Some DBMS support natively something that behaves like update or insert, Rails recently added the method Upsert that can take advantage of this method.

It is useful to update information that do not need to run validations, meaning, in a super performant way, here's an example:

# We usually do this:
activity = current_user.activity
if activity 
  activity.update(last_seen_at: Time.current)
else
  current_user.activity.create(last_seen_at: Time.current)
end

But if you see, it does not need to run any validation, it just needs to update last_seen_at if exists or create a new one, it performs two queries: one to instanciate the activity object and a second one that performs the real update/insert statement.

That can be replaced with the following code and it will perform just a single query and it will take care to either create the record or update an existing one

Activity.upsert({ last_seen_at: Time.current, user_id: current_user.id}, unique_by: :user_id)

To make this really work, considering you use Postgresql, you have to add a unique index on user_id and modify default value on created_at and updated_at in a migration like this:

query = <<-SQL
  ALTER TABLE #{Activity.table_name}
  ALTER COLUMN created_at SET DEFAULT CURRENT_TIMESTAMP,
  ALTER COLUMN updated_at SET DEFAULT CURRENT_TIMESTAMP
SQL
execute(query)

Updating ActiveRecord models without loading an instance using Postgresql

There're sometimes that we need to update an ActiveRecord Model but it is not necesary to load an instance, the normal flow would be the following:

profile = UserProfile.find_by(user_id: id)
profile.update(last_seen_at: Time.now)

The problem with this is that we load a useless instance of UserProfile, it is not needed, in a high traffic sites, an extra select query can count a lot, but luckly, Rails has addressed this with upsert command:

UserProfile.upsert({ last_seen_at: Time.now, user_id: id }, unique_by: :user_id)

This will use native Postgresql upsert command to update a record if it exists or insert a new one and no select will be performed, everything in a single query instead of two.

To make it really work, you need to modify your created_at and updated_at columns to have default current_timestamp

When to eager load relationships in Rails and when it is not that good

Rails provides a way to eager load relationships when fetching objects, the main idea is to avoid queries N+1, but, when isn't a good idea when to do it?

Good

When rendering unique results that can not be cached, for example: table reports

Why? Most of the times you need to display related information

orders = Order.includes(:user, :line_items).completed

Try to avoid

When you use fragment cache

Why? Eager load is executed before the rendering, regardless the final result is already cached or not. If using eager loading, it will always be executed, but when allowing queries n+1 that query will be executed once to fill the cache, and that's it

products = Product.includes(:categories, variants: [:price]).search(keywords)

Use product.id & updated_at to fill a fragment cache and fetch the data from database only when needed, no extra info needed such as variants, categories, prices, etc

Beware of calling #count on Active Record relations!

Given code like this:

records = Record.includes(:related).all # Eager-loads to prevent N+1 queries...
records.each do |record|
  puts record.related.count # => ... but this produces N+1 queries anyway!
end

If you run this, you'll notice you get an N+1 queries problem, even though we're using #includes. This happens because of record.related.count. Remember, records.related here is not an Array but an instance of CollectionProxy and its #count method always reaches out to the database. Use #length or #size instead to solve this issue.

records = Record.includes(:related).all
records.each do |record|
  puts record.related.length # Problem solved!
end

How to add timeouts to slow queries

Sometimes some of your queries are taking too long to execute; you can specify optimizer hints and define timeouts for those queries.

Employee.optimizer_hints("MAX_EXECUTION_TIME(5000)").all

It will raise a StatementTimeout exception if the query takes longer than usual to execute

Example (for PostgreSQL with pg_hint_plan):

Employee.optimizer_hints("SeqScan(employees)", "Parallel(employees 8)")

Example (for MySQL):

Employee.optimizer_hints("MAX_EXECUTION_TIME(50000)", "NO_INDEX_MERGE(employees)")

There are many causes for sudden slow queries in many databases, such as missing index, wrong catching, and performance.

But this is a topic for another day!