Abdulrahman AlQallaf

Decluttering my mind into the web ...







New tools for data analysis (ipython-sql and Azure Data Studio)

date posted: 2021-Oct-16, last edit date: 2021-Oct-16


Experimenting with ipython-sql

import pandas as pd
import os
from dotenv import load_dotenv
%load_ext sql


load environment variables and build databse connection string

load_dotenv() # take environment variables from .env
user = os.environ['DB_USER']
password = os.environ['DB_PASSWORD']
connection_string = "postgresql://{user}:{password}@localhost/ecommerce".format(user=user, password=password)


preview data

%%time
%%sql $connection_string

SELECT * FROM public.ecommerce
LIMIT 10
10 rows affected.
CPU times: user 33 ms, sys: 16.7 ms, total: 49.7 ms
Wall time: 74 ms
event_time event_type product_id category_id category_code brand price user_id user_session
2019-10-01 00:00:00 UTC view 44600062 2103807459595387724 None shiseido 35.79 541312140 72d76fde-8bb3-4e00-8c23-a032dfed738c
2019-10-01 00:00:00 UTC view 3900821 2053013552326770905 appliances.environment.water_heater aqua 33.20 554748717 9333dfbd-b87a-4708-9857-6336556b0fcc
2019-10-01 00:00:01 UTC view 17200506 2053013559792632471 furniture.living_room.sofa None 543.10 519107250 566511c2-e2e3-422b-b695-cf8e6e792ca8
2019-10-01 00:00:01 UTC view 1307067 2053013558920217191 computers.notebook lenovo 251.74 550050854 7c90fc70-0e80-4590-96f3-13c02c18c713
2019-10-01 00:00:04 UTC view 1004237 2053013555631882655 electronics.smartphone apple 1081.98 535871217 c6bd7419-2748-4c56-95b4-8cec9ff8b80d
2019-10-01 00:00:05 UTC view 1480613 2053013561092866779 computers.desktop pulser 908.62 512742880 0d0d91c2-c9c2-4e81-90a5-86594dec0db9
2019-10-01 00:00:08 UTC view 17300353 2053013553853497655 None creed 380.96 555447699 4fe811e9-91de-46da-90c3-bbd87ed3a65d
2019-10-01 00:00:08 UTC view 31500053 2053013558031024687 None luminarc 41.16 550978835 6280d577-25c8-4147-99a7-abc6048498d6
2019-10-01 00:00:10 UTC view 28719074 2053013565480109009 apparel.shoes.keds baden 102.71 520571932 ac1cd4e5-a3ce-4224-a2d7-ff660a105880
2019-10-01 00:00:11 UTC view 1004545 2053013555631882655 electronics.smartphone huawei 566.01 537918940 406c46ed-90a4-4787-a43b-59a410c1a5fb


check count

%%time
%%sql $connection_string

SELECT count(*) FROM public.ecommerce
1 rows affected.
CPU times: user 12.7 ms, sys: 12.8 ms, total: 25.5 ms
Wall time: 4min 23s
count
411709736


describe table

%%time
%%sql $connection_string

SELECT
   table_name,
   column_name,
   data_type
FROM
   information_schema.columns
WHERE
   table_name = 'ecommerce_2';
9 rows affected.
CPU times: user 240 µs, sys: 8.54 ms, total: 8.78 ms
Wall time: 19.4 ms
table_name column_name data_type
ecommerce_2 event_time timestamp without time zone
ecommerce_2 event_type text
ecommerce_2 product_id text
ecommerce_2 category_id text
ecommerce_2 category_code text
ecommerce_2 brand text
ecommerce_2 price real
ecommerce_2 user_id bigint
ecommerce_2 user_session text


plot summarized data

%%time

query = """
SELECT    date_part('year', event_time)*100+date_part('month', event_time) as year_month
          ,count(*) as cnt
from      ecommerce_2
where     event_time >= '2019-10-01'
group by  year_month
order by  year_month
"""

result = %sql $connection_string $query
print (result)
7 rows affected.
+------------+----------+
| year_month |   cnt    |
+------------+----------+
|  201910.0  | 42482059 |
|  201911.0  | 67503595 |
|  201912.0  | 67514039 |
|  202001.0  | 56019468 |
|  202002.0  | 55291449 |
|  202003.0  | 56335286 |
|  202004.0  | 66539750 |
+------------+----------+
CPU times: user 13.2 ms, sys: 14.7 ms, total: 27.9 ms
Wall time: 4min 4s
import matplotlib.pyplot as plt

df = result.DataFrame()
df['year_month_2'] = df['year_month'].astype('int32').astype('str')
df.plot(x='year_month_2', y='cnt')
plot output


Final comments:

  • good integration between SQL and Python pandas
  • not a subtitute in my opinion to a software like (MS-SSMS, pgAdmin, Oracle SQL Developer)
  • no syntax highlighting
  • does not show execution time by default





# Post Title Date Posted Last Edit Date
1 [Medium Article] 12 Years in Data & AI: Lesson #2 — How Organizational Structure Determines Data & AI Success 2026-03-18 2026-03-18
2 [Medium Article] 12 Years in Data & AI: Lessons Organizations Keep Learning the Hard Way 2026-03-07 2026-03-11
3 [Medium Article] Kuwait's Addition to the FATF Grey List — Why Implementation Matters More Than Regulation 2026-02-18 2026-03-11
4 current post -- New tools for data analysis (ipython-sql and Azure Data Studio) 2021-10-16 2021-10-16
5 How to setup a remote jupyter server 2021-10-13 2026-02-02
6 How to setup ZSH 2021-10-12 2021-10-12
7 Concepts in Data Architecture 2021-05-01 2026-02-02
8 Loading data into PostgreSQL reference 2021-03-19 2026-02-02
9 SQLite <--> Pandas reference 2021-03-15 2021-03-15
10 Creating an automated data mart (in SQL) 2020-07-16 2020-07-16
11 Personal finance management and open banking 2020-07-12 2020-07-12
12 Quick SQL reference 2020-07-03 2026-02-02
13 How to back up Digital Ocean Spaces (S3 compatible storage) 2020-06-07 2020-07-03
14 PostgreSQL initiation 2020-05-25 2020-06-01