Issue
I'm a total newbie in ML and everything in it.
I have a ~15k log and my goal is to extract 3 to 8-grams from it. The code I'm using is partially adopted from this question.
df = pd.read_fwf(r'C:\path\to\my\LOG.txt')
vect = sklearn.feature_extraction.text.CountVectorizer(ngram_range=(3,8))
vect.fit(df)
for w in vect.get_feature_names_out():
print(w)
The code actually works, but I'm not able to "iterate" over the txt. The result of the execution only returns the first X n-grams extracted from the first 2-3 lines of the log. How can I read and extract all the n-grams from the document?
EXTRA QUESTION: Since the final goal is to extract the n-grams and build a tf-idf model on them, does the fact that my log is a TXT instead of CSV represent a problem? I have variable-lenght lines so CSV is not feasible I guess.
Solution
Use a for loop on a file object to read it line-by-line. Use with open(...) to let a context manager ensure that the file is closed after reading:
with open("log.txt") as infile:
for line in infile:
print(line)
Answered By - abo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.