Decluttering my mind into the web ...
date posted: 2020-Jul-16, last edit date: 2020-Jul-16
I recently created a data mart for work, no one have asked me to do it but I did it mainly to make my life easier. With time, it started being used by other people in my department so it had to be automated.
These are the things that I worked on providing (in addition to the data mart):
Since the scripts creating the data mart can be executed in parallel, I had to embed the performance logging functionality within each script, and each script creates its own little table (to avoid locking) and then put everyhting into one single table for reference.
I didn't want to do the same with testing as it would make the code less clean, so I had to split the testing into pre-testing and post-testing, all writing into a single table, then check at the end if there were any issues with execution (I know, it is not ideal as I had to wait for the whole thing to run and then check whether or not there were any issues, as opposed to stopping execution as soon as an issue is detected). This is not ideal but it gets the job done (it is not like I am given dedicated time to automate anyway).
At the end, I settled with the following structure for tests:
drop table if exists test_log create table test_log ( run_datetime datetime not null -- datetime of when the test was executed ,run_user_id varchar(20) not null -- test was executed by which user id ,run_session_id int not null -- session id (to be able to group by session id and create a summary) ,test_id int not null -- test identifier ,test_desc clob not null -- test description ,test_type varchar(20) not null -- <pre, post> ,test_comment clob null -- whatever you want to be logged ,test_outcome varchar(200) not null -- <success, warning, fail> ) commit;
And the following simple structure for performance logging:
drop table if exists performance_log create table performance_log ( exec_datetime datetime null, exec_desc clob null, exec_source varchar(200) null ) commit;
# | Post Title | Date Posted | Last Edit Date |
---|---|---|---|
1 | New tools for data analysis (ipython-sql and Azure Data Studio) | 2021-10-16 | 2021-10-16 |
2 | How to setup a remote jupyter server | 2021-10-13 | 2021-10-13 |
3 | How to setup ZSH | 2021-10-12 | 2021-10-12 |
4 | Concepts in Data Architecture | 2021-05-01 | 2021-10-16 |
5 | Loading data into PostgreSQL reference | 2021-03-19 | 2021-03-19 |
6 | SQLite <--> Pandas reference | 2021-03-15 | 2021-03-15 |
7 | current post -- Creating an automated data mart (in SQL) | 2020-07-16 | 2020-07-16 |
8 | Personal finance management and open banking | 2020-07-12 | 2020-07-12 |
9 | Quick SQL reference | 2020-07-03 | 2020-11-29 |
10 | How to back up Digital Ocean Spaces (S3 compatible storage) | 2020-06-07 | 2020-07-03 |
11 | PostgreSQL initiation | 2020-05-25 | 2020-06-01 |