Abdulrahman AlQallaf

Decluttering my mind into the web ...







Creating an automated data mart (in SQL)

date posted: 2020-Jul-16, last edit date: 2020-Jul-16


I recently created a data mart for work, no one have asked me to do it but I did it mainly to make my life easier. With time, it started being used by other people in my department so it had to be automated.

 

These are the things that I worked on providing (in addition to the data mart):

  • Provided a data dictionary + proper documentation.
  • Performance logging functionality (output written to a table).
  • Automated pre/post testing (output written to a table).

 

Since the scripts creating the data mart can be executed in parallel, I had to embed the performance logging functionality within each script, and each script creates its own little table (to avoid locking) and then put everyhting into one single table for reference.

I didn't want to do the same with testing as it would make the code less clean, so I had to split the testing into pre-testing and post-testing, all writing into a single table, then check at the end if there were any issues with execution (I know, it is not ideal as I had to wait for the whole thing to run and then check whether or not there were any issues, as opposed to stopping execution as soon as an issue is detected). This is not ideal but it gets the job done (it is not like I am given dedicated time to automate anyway).

 

At the end, I settled with the following structure for tests:

drop table if exists test_log
create table test_log (
         run_datetime           datetime                not null        -- datetime of when the test was executed
        ,run_user_id            varchar(20)             not null        -- test was executed by which user id
        ,run_session_id         int                     not null        -- session id (to be able to group by session id and create a summary)
        ,test_id                int                     not null        -- test identifier
        ,test_desc              clob                    not null        -- test description
        ,test_type              varchar(20)             not null        -- <pre, post>
        ,test_comment           clob                    null            -- whatever you want to be logged
        ,test_outcome           varchar(200)            not null        -- <success, warning, fail>
)
commit;

 

And the following simple structure for performance logging:

drop table if exists performance_log
create table performance_log (
        exec_datetime           datetime        null,
        exec_desc               clob            null,
        exec_source             varchar(200)    null    
)
commit;

 






# Post Title Date Posted Last Edit Date
1 New tools for data analysis (ipython-sql and Azure Data Studio) 2021-10-16 2021-10-16
2 How to setup a remote jupyter server 2021-10-13 2021-10-13
3 How to setup ZSH 2021-10-12 2021-10-12
4 Concepts in Data Architecture 2021-05-01 2021-10-16
5 Loading data into PostgreSQL reference 2021-03-19 2021-03-19
6 SQLite <--> Pandas reference 2021-03-15 2021-03-15
7 current post -- Creating an automated data mart (in SQL) 2020-07-16 2020-07-16
8 Personal finance management and open banking 2020-07-12 2020-07-12
9 Quick SQL reference 2020-07-03 2020-11-29
10 How to back up Digital Ocean Spaces (S3 compatible storage) 2020-06-07 2020-07-03
11 PostgreSQL initiation 2020-05-25 2020-06-01