Issue
I my data looks like below. I am trying to pivot the dataframe such that SCHEMA NAME AND TABLE NAME are in columns and Row Count, Table Type, date created and Date altered are in 1st column. I am referencing this post to achieve my desired output. Unfortunately, cannot find the solution that I need.Below is my code and desired output. The code gets SCHEMA_NAME and "TABLE_NAME" in columns which is correct, only the index and value part of the code is not producing desired output
Thanks in advance for your time and efforts!
Dataframe:
{'TABLE_SCHEMA': 0 TPCDS_SF100TCL
1 TPCDS_SF100TCL
2 TPCDS_SF100TCL
3 TPCDS_SF100TCL
4 TPCDS_SF100TCL
5 TPCDS_SF10TCL
6 TPCDS_SF10TCL
7 TPCDS_SF10TCL
8 TPCDS_SF10TCL
9 TPCDS_SF10TCL
Name: TABLE_SCHEMA, dtype: object,
'TABLE_TYPE': 0 BASE TABLE
1 BASE TABLE
2 BASE TABLE
3 BASE TABLE
4 BASE TABLE
5 BASE TABLE
6 BASE TABLE
7 BASE TABLE
8 BASE TABLE
9 BASE TABLE
Name: TABLE_TYPE, dtype: object,
'TABLE_NAME': 0 CALL_CENTER
1 CATALOG_PAGE
2 CUSTOMER
3 CUSTOMER_ADDRESS
4 CUSTOMER_DEMOGRAPHICS
5 CALL_CENTER
6 CATALOG_PAGE
7 CUSTOMER
8 CUSTOMER_DEMOGRAPHICS
9 CUSTOMER_ADDRESS
Name: TABLE_NAME, dtype: object,
'ROW_COUNT': 0 60
1 50000
2 100000000
3 50000000
4 1920800
5 54
6 40000
7 65000000
8 1920800
9 32500000
Name: ROW_COUNT, dtype: object,
'TABLE_CREATED_DATE': 0 2022-03-02
1 2022-03-02
2 2022-03-02
3 2022-03-02
4 2022-03-02
5 2022-03-02
6 2022-03-02
7 2022-03-02
8 2022-03-02
9 2022-03-02
Name: TABLE_CREATED_DATE, dtype: object,
'LAST_ALTERED_DATE': 0 2022-05-06
1 2022-03-02
2 2022-03-02
3 2022-03-02
4 2022-03-02
5 2022-03-02
6 2022-03-02
7 2022-03-02
8 2022-03-02
9 2022-03-02
Name: LAST_ALTERED_DATE, dtype: object}
Python Code:
pd.pivot(df, columns = ["TABLE_SCHEMA","TABLE_NAME"],index=['ROW_COUNT','TABLE_TYPE','TABLE_CREATED_DATE','LAST_ALTERED_DATE'],
values=['ROW_COUNT','TABLE_TYPE','TABLE_CREATED_DATE','LAST_ALTERED_DATE'])
Desired outuput (Below output is for 1 schema I need both schemas in single table)
TABLE_SCHEMA TPCDS_SF100TCL
TABLE_NAME CALL_CENTER CATALOG_PAGE CUSTOMER CUSTOMER_ADDRESS CUSTOMER_DEMOGRAPHICS
ROW_COUNT
TABLE_TYPE
TABLE_CREATED_DATE
LAST_ALTERED_DATE
Solution
try this:
data_df.set_index(['TABLE_SCHEMA', 'TABLE_NAME'], drop=True).T
data_df is the original dataframe that you provided
Answered By - user2990604
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.