Converting an Analytics System from Postgres to Redshift

A case study of migrating an analytical system migration from PostgreSQL to AWS Redshift using the Vertabelo data modeler.

Online systems tend to track user’s actions. Gathering information about users’ behavior can increase the quality of their experience, which can lead to increased business income. In this article, we will show how to reimplement an existing Postgres database to a more complex analytics database like Amazon Redshift.

The solution we want to reengineer is a tracking system for an online SQL learning platform like LearnSQL.com.

The source implementation is built on a PostgreSQL database and contains two main tables:

The source implementation is built on a PostgreSQL database and contains two main tables:

  1. action – Tracks all users actions. Each record contains:
    1. user_profile_id – A unique identifier for each user.
    2. session_id – An identifier for each unique learning experience.
    3. action_time – The timestamp when this action occurred.
    4. course_id – The course being taken by the user when the action occurred.
    5. exercise_id – The exercise being completed by the user.
    6. is_exercise_checked – A true/false value that indicates if the exercise required user interaction.
    7. is_async – Not important for the purpose of this article.
    8. query – The SQL query which the user entered to complete the exercise.
  2. action_type – A dictionary table containing three unchanging columns (id, code, and name). It stores many kinds of possible user actions, the most important of which are:
    1. login – The user logged into the system.
    2. logout – The user logged out of the system.
    3. logout_session_expired – The user’s session expired, i.e. there has been no action for more than a few minutes.
    4. course_start – When the first exercise in the course was completed.
    5. course_finish – When all the exercises in the course were completed.
    6. course_page_open – When a course’s page was opened.
    7. exercise_page_open – When an exercise’s page was opened.
    8. execute_exercise_code – When the user executed some SQL code.
    9. successful_exercise_code_execution – Indicates if the user finished the exercise.
    10. failed_exercise_code_execution – Indicates if the user failed to finish the exercise.

redshift, analytics system

This information allows analytics operators to track the time spent by users. For example, the following SQL query returns the overall time spent by a user and the average session length:

The above query uses the WITH clause. First, we select users’ sessions as a time between the last action max(action_time) and the first action min(action_time) as the scoped view user_sessions. The view is then used in the final query, where the SUM() of the total time spent and the AVG() session time are calculated for each user.

The result for this query is:

Tools in this post

Redgate Data Modeler

Design, update, and manage database schemas

Find out more