Issue
I've found out that same fields/columns have different business context for certain regions in my dataset.
I have few markets but lets take for sample example - GB and US. Most of those columns have the same meaning, but there are pairs which not, like:
SB1 for US is Strength Evaluation
SB2 for US is Power Evaluation
while
SB1 for GB is Power Evaluation
SB2 for GB is Strength Evaluation
and its case in whole dataset that one country SB1 is Power SB2 strength, for another its reversed and so on. Ideally looking for advice in PySpark, but while I will be working in Databricks, SQL may do as well.
My silver layer looks like that
ID | Market | CK | SB1 | SB2 | SbX | ColX |
---|---|---|---|---|---|---|
1 | US | 1US | 2 | 1 | 9 | 9 |
2 | US | 2US | 2 | 2 | 9 | 9 |
3 | US | 3US | 1 | 1 | 9 | 9 |
1 | GB | 1GB | 3 | 5 | 9 | 9 |
2 | GB | 2GB | 4 | 4 | 9 | 9 |
3 | GB | 3GB | 5 | 3 | 9 | 9 |
What is expected output in that scenario I guess is (look at SB1 SB2 cols)
ID | Market | CK | SB1 | SB2 | SbX | ColX |
---|---|---|---|---|---|---|
1 | US | 1US | 2 | 1 | 9 | 9 |
2 | US | 2US | 2 | 2 | 9 | 9 |
3 | US | 3US | 1 | 1 | 9 | 9 |
1 | GB | 1GB | 5 | 3 | 9 | 9 |
2 | GB | 2GB | 4 | 4 | 9 | 9 |
3 | GB | 3GB | 3 | 5 | 9 | 9 |
Each dataset got over 50 columns, and there are almost 10 markets
Any protips, ideas how to handle that? I guess it can't be solved on ingestion level, so raw and curated zone is not the place to make it happen I guess, the curated dataset has to be transformed and values filled accordingly
Solution
You can surely use standard SQL feature - CASE..WHEN
as follows:
select id, market, ck,
case market when 'US' then SB1
when 'GB' then SB2
end as SB1,
case market when 'US' then SB2
when 'GB' then SB1
end as SB2,
sbx, colx
from your_table
Alternatively, You can also use UNION ALL
as follows:
select id, market, ck, sb1, sb2, sbx, colx from your_table where market = 'US'
UNION ALL
select id, market, ck, sb2, sb1, sbx, colx from your_table where market = 'GB'
Answered By - ORA-01017
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.