Decluttering my mind into the web ...
date posted: 2021-Oct-16, last edit date: 2021-Oct-16
import pandas as pd
import os
from dotenv import load_dotenv
%load_ext sql
load_dotenv() # take environment variables from .env
user = os.environ['DB_USER']
password = os.environ['DB_PASSWORD']
connection_string = "postgresql://{user}:{password}@localhost/ecommerce".format(user=user, password=password)
%%time
%%sql $connection_string
SELECT * FROM public.ecommerce
LIMIT 10
10 rows affected. CPU times: user 33 ms, sys: 16.7 ms, total: 49.7 ms Wall time: 74 ms
event_time | event_type | product_id | category_id | category_code | brand | price | user_id | user_session |
---|---|---|---|---|---|---|---|---|
2019-10-01 00:00:00 UTC | view | 44600062 | 2103807459595387724 | None | shiseido | 35.79 | 541312140 | 72d76fde-8bb3-4e00-8c23-a032dfed738c |
2019-10-01 00:00:00 UTC | view | 3900821 | 2053013552326770905 | appliances.environment.water_heater | aqua | 33.20 | 554748717 | 9333dfbd-b87a-4708-9857-6336556b0fcc |
2019-10-01 00:00:01 UTC | view | 17200506 | 2053013559792632471 | furniture.living_room.sofa | None | 543.10 | 519107250 | 566511c2-e2e3-422b-b695-cf8e6e792ca8 |
2019-10-01 00:00:01 UTC | view | 1307067 | 2053013558920217191 | computers.notebook | lenovo | 251.74 | 550050854 | 7c90fc70-0e80-4590-96f3-13c02c18c713 |
2019-10-01 00:00:04 UTC | view | 1004237 | 2053013555631882655 | electronics.smartphone | apple | 1081.98 | 535871217 | c6bd7419-2748-4c56-95b4-8cec9ff8b80d |
2019-10-01 00:00:05 UTC | view | 1480613 | 2053013561092866779 | computers.desktop | pulser | 908.62 | 512742880 | 0d0d91c2-c9c2-4e81-90a5-86594dec0db9 |
2019-10-01 00:00:08 UTC | view | 17300353 | 2053013553853497655 | None | creed | 380.96 | 555447699 | 4fe811e9-91de-46da-90c3-bbd87ed3a65d |
2019-10-01 00:00:08 UTC | view | 31500053 | 2053013558031024687 | None | luminarc | 41.16 | 550978835 | 6280d577-25c8-4147-99a7-abc6048498d6 |
2019-10-01 00:00:10 UTC | view | 28719074 | 2053013565480109009 | apparel.shoes.keds | baden | 102.71 | 520571932 | ac1cd4e5-a3ce-4224-a2d7-ff660a105880 |
2019-10-01 00:00:11 UTC | view | 1004545 | 2053013555631882655 | electronics.smartphone | huawei | 566.01 | 537918940 | 406c46ed-90a4-4787-a43b-59a410c1a5fb |
%%time
%%sql $connection_string
SELECT count(*) FROM public.ecommerce
1 rows affected. CPU times: user 12.7 ms, sys: 12.8 ms, total: 25.5 ms Wall time: 4min 23s
count |
---|
411709736 |
%%time
%%sql $connection_string
SELECT
table_name,
column_name,
data_type
FROM
information_schema.columns
WHERE
table_name = 'ecommerce_2';
9 rows affected. CPU times: user 240 µs, sys: 8.54 ms, total: 8.78 ms Wall time: 19.4 ms
table_name | column_name | data_type |
---|---|---|
ecommerce_2 | event_time | timestamp without time zone |
ecommerce_2 | event_type | text |
ecommerce_2 | product_id | text |
ecommerce_2 | category_id | text |
ecommerce_2 | category_code | text |
ecommerce_2 | brand | text |
ecommerce_2 | price | real |
ecommerce_2 | user_id | bigint |
ecommerce_2 | user_session | text |
%%time
query = """
SELECT date_part('year', event_time)*100+date_part('month', event_time) as year_month
,count(*) as cnt
from ecommerce_2
where event_time >= '2019-10-01'
group by year_month
order by year_month
"""
result = %sql $connection_string $query
print (result)
7 rows affected. +------------+----------+ | year_month | cnt | +------------+----------+ | 201910.0 | 42482059 | | 201911.0 | 67503595 | | 201912.0 | 67514039 | | 202001.0 | 56019468 | | 202002.0 | 55291449 | | 202003.0 | 56335286 | | 202004.0 | 66539750 | +------------+----------+ CPU times: user 13.2 ms, sys: 14.7 ms, total: 27.9 ms Wall time: 4min 4s
import matplotlib.pyplot as plt
df = result.DataFrame()
df['year_month_2'] = df['year_month'].astype('int32').astype('str')
df.plot(x='year_month_2', y='cnt')
<AxesSubplot:xlabel='year_month_2'>
# | Post Title | Date Posted | Last Edit Date |
---|---|---|---|
1 | current post -- New tools for data analysis (ipython-sql and Azure Data Studio) | 2021-10-16 | 2021-10-16 |
2 | How to setup a remote jupyter server | 2021-10-13 | 2021-10-13 |
3 | How to setup ZSH | 2021-10-12 | 2021-10-12 |
4 | Concepts in Data Architecture | 2021-05-01 | 2021-10-16 |
5 | Loading data into PostgreSQL reference | 2021-03-19 | 2021-03-19 |
6 | SQLite <--> Pandas reference | 2021-03-15 | 2021-03-15 |
7 | Creating an automated data mart (in SQL) | 2020-07-16 | 2020-07-16 |
8 | Personal finance management and open banking | 2020-07-12 | 2020-07-12 |
9 | Quick SQL reference | 2020-07-03 | 2020-11-29 |
10 | How to back up Digital Ocean Spaces (S3 compatible storage) | 2020-06-07 | 2020-07-03 |
11 | PostgreSQL initiation | 2020-05-25 | 2020-06-01 |