Issue
I have a csv with 15 rows, each with 50 values, i.e. 50 columns. The first row, i.e. the header has the labels/names for the values.
File looks like that (filled for 50 columns, 15 rows, some values are nan):
label1, label2, label3, ..., label50
0123, 345, nan, ..., 287
4324, nan, 343, ..., 362
...
I want to plot each column values vertically. For 15 rows including header = 14 values on one horizontal x value (which is the label). So that my X-axis is discrete with the label names as values.
One approach which worked but only works for boxplots and not scatter points is the following (for images see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.boxplot.html):
df = pd.read_csv("data.csv", delimiter=",")
df.plot.box() # plots boxplot with discrete x-axis values as labels
plt.xticks(rotation=90, ha='right') # label names are 90 degree turned on x-axis
plt.yscale(log) # logscale for my dataset
plt.show()
I would like to have the same result as the boxplot gives. But instead of boxes I want to see every point of the columns vertically distrbuted and, if possible, every row of the csv in a unique color to separate the rows from each other in the diagram. (One row is one "combination" of datapoints)
As a beginner, I didn't find a solution yet...
Thanks a lot in advance. Feel free to ask when you didn't understand my explanation.
Solution
You could try pandas' parallel_coordinates
. You'll need to add an extra column to give each row a unique label. You can remove the linestyle and use a dot as marker:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1, 100, (15, 50)).astype(float), columns=[f'lbl{i}' for i in range(1, 51)])
df['Name'] = [f'row{i}' for i in range(len(df))]
fig, ax = plt.subplots(figsize=(25, 8))
pd.plotting.parallel_coordinates(df, 'Name', color=plt.cm.tab20(np.arange(len(df))), ls='', marker='o', ax=ax)
ax.legend(bbox_to_anchor=(1.01, 1.02), loc='upper left')
plt.tight_layout()
plt.show()
PS: You can use pd.plotting.parallel_coordinates(..., axvlines=False)
and ax.grid(False, axis='x')
if the vertical lines aren't desired. ax.tick_params(axis='x', rotation=30)
would rotate the x-labels with 30 degrees.
Here is another example which also sets some margin left and right.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1, 10, (15, 50)).astype(float).cumsum(axis=0), columns=[f'lbl{i}' for i in range(1, 51)])
df['Name'] = [f'row{i}' for i in range(len(df))]
fig, ax = plt.subplots(figsize=(15, 8))
pd.plotting.parallel_coordinates(df, 'Name', color=plt.cm.turbo(np.linspace(0, 1, len(df)) ), ls='', marker='o', axvlines=False, ax=ax)
ax.legend(bbox_to_anchor=(1.01, 1.02), loc='upper left')
ax.grid(False)
ax.tick_params(axis='x', rotation=30)
ax.autoscale()
ax.margins(x=0.01)
plt.tight_layout()
plt.show()
Answered By - JohanC
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.