Issue
I am very new to Python and would like to use it for my mass spectrometry data analysis. I have a txt file that is separated by tabulator. I can import it into Excel with the import assistant. I have also managed to import it into spyder with the import assistant, but I would like to automate the process. Is there a way to "record" the import settings I use while manually loading the data? That way I would generate a code that I could use in the future for the other txt files.
I've tried using NumPy and pandas to import my data but my txt file contains strings and numbers (floats) and I have not managed to tell Python to distinguish between the two.
When in import the file manually I get the exat DataFrame I want with the first row as a header, and the strings, and numbers correctly formatted.
here is a sample of my txt file:
Protein.IDs Majority.protein.IDs Peptide.counts..all.
0 LmxM.01.0330.1-p1 LmxM.01.0330.1-p1 5
1 LmxM.01.0410.1-p1 LmxM.01.0410.1-p1 15
2 LmxM.01.0480.1-p1 LmxM.01.0480.1-p1 14
3 LmxM.01.0490.1-p1 LmxM.01.0490.1-p1 27
4 LmxM.01.0520.1-p1 LmxM.01.0520.1-p1 27
Solution
Using numpy or pandas is the best way to automate the process, so good job using the right tools.
I suggest that you look at all the options that the pandas read_csv
function has to offer. There is most likely a single line of code that can import the data properly by using the right options.
In particular, look at the decimal
option if the floats are not parsed correctly.
Other solutions, which you may still want to use even if you use pandas properly are:
- Formatting the input data to make your life easier : either when it is generated, or using some notepad with good macros (Notepadd++ can replace expression or accomplish tedious repeating keystrokes for you).
- Formatting the output of the pandas import. If you still have strings that should be interpreted as numeric values, maybe you can run a loop to check that all values are converted in the format that they should be in.
Finally, you may want to provide some examples when you ask technical questions: show an example of data, the code that you're using, and the output of your code would make answering your question easier :)
Edit:
From the data example that you posted, it seems to me that pandas should separate the data just fine and detect strings and numerical values without trouble.
Look at the options sep
of read_csv
. The default is ','
, you probably want to switch it to a tabulation: '\t'
Try this:
pandas.read_csv(my_filename, sep='\t')
You may run into some header issue, which you can solve using the header
and names
options.
Answered By - Xavier Audier
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.