Issue
I need to split the document path to the foldername and the document name in python. It is a large dataframe including many rows.For the filename with no document name followed, just leave the document name column blank in the result. For example, I have a dataframe like the follows:
no filename
1 \\apple\config.csv
2 \\apple\fox.pdf
3 \\orange\cat.xls
4 \\banana\eggplant.pdf
5 \\lucy
...
I expect the output shown as follows:
foldername documentname
\\apple config.csv
\\apple fox.pdf
\\orange cat.xls
\\banana eggplant.pdf
\\lucy
...
I have tried the following code,but it does not work.
y={'Foldername':[],'Docname':[]}
def splitnames(x):
if "." in x:
docname=os.path.basename(x)
rm="\\"+docname
newur=x.replace(rm,'')
else:
newur=x
docname=""
result=[newur,docname]
y["Foldername"].append(result[0])
y["Docname"].append(result[1])
return y;
dff=df$filename.apply(splitnames)
Thank you so much for the help!!
Solution
Not sure how you're getting the paths, but you could create some Pathlib objects and use some class methods to grab the file name and folder name.
:
from pathlib import Path
data = """ no filename
1 \\apple\\config.csv
2 \\apple\\fox.pdf
3 \\orange\\cat.xls
4 \\banana\\eggplant.pdf
5 \\lucy"""
df = pd.read_csv(StringIO(data),sep='\s+')
df['filename'] = df['filename'].apply(Path)
df['folder'] = df['filename'].apply(lambda x : x.parent if '.' in x.suffix else x)
df['document_name'] = df['filename'].apply(lambda x : x.name if '.' in x.suffix else np.nan)
print(df)
no filename folder document_name
0 1 \apple\config.csv \apple config.csv
1 2 \apple\fox.pdf \apple fox.pdf
2 3 \orange\cat.xls \orange cat.xls
3 4 \banana\eggplant.pdf \banana eggplant.pdf
4 5 \lucy \lucy NaN
Answered By - Umar.H
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.