Sunday, July 17, 2022

[FIXED] Python LDA gensim "DeprecationWarning: invalid escape sequence"

July 17, 2022 deprecation-warning, export-to-csv, gensim, python-3.x, r No comments

Issue

I am new to stackoverflow and python so please bear with me. I am trying to run an Latent Dirichlet Analysis on a text corpora with the gensim package in python using PyCharm editor. I prepared the corpora in R and exported it to a csv file using this R command:

write.csv(testdf, "C://...//test.csv", fileEncoding = "utf-8")

Which creates the following csv structure (though with much longer and already preprocessed texts):

,"datetimestamp","id","origin","text"
1,"1960-01-01","id_1","Newspaper1","Test text one"
2,"1960-01-02","id_2","Newspaper1","Another text"
3,"1960-01-03","id_3","Newspaper1","Yet another text"
4,"1960-01-04","id_4","Newspaper2","Four Five Six"
5,"1960-01-05","id_5","Newspaper2","Alpha Bravo Charly"
6,"1960-01-06","id_6","Newspaper2","Singing Dancing Laughing"

I then try the following essential python code (based on the gensim tutorials) to perform simple LDA analysis:

import gensim
from gensim import corpora, models, similarities, parsing
import pandas as pd
from six import iteritems
import os
import pyLDAvis.gensim

class MyCorpus(object):
     def __iter__(self):
             for row in pd.read_csv('//mpifg.local/dfs/home/lu/Meine Daten/Imagined Futures and Greek State Bonds/Topic Modelling/Python/test.csv', index_col=False, header = 0 ,encoding='utf-8')['text']:
                 # assume there's one document per line, tokens separated by whitespace
                 yield dictionary.doc2bow(row.split())

if __name__ == '__main__':
    dictionary = corpora.Dictionary(row.split() for row in pd.read_csv(
        '//.../test.csv', index_col=False, encoding='utf-8')['text'])
    print(dictionary)
    dictionary.save(
        '//.../greekdict.dict')  # store the dictionary, for future reference

    ## create an mmCorpus
    corpora.MmCorpus.serialize('//.../greekcorpus.mm', MyCorpus())
    corpus = corpora.MmCorpus('//.../greekcorpus.mm')

    dictionary = corpora.Dictionary.load('//.../greekdict.dict')
    corpus = corpora.MmCorpus('//.../greekcorpus.mm')

    # train model
    lda = gensim.models.ldamodel.LdaModel(corpus=corpus, id2word=dictionary, num_topics=50, iterations=1000)

I get the following error codes and the code exits:

...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:832: DeprecationWarning: invalid escape sequence \d

\...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:2736: DeprecationWarning: invalid escape sequence \d

\...\Python\venv\lib\site-packages\setuptools-28.8.0-py3.6.egg\pkg_resources_vendor\pyparsing.py:2914: DeprecationWarning: invalid escape sequence \g

\...\Python\venv\lib\site-packages\pyLDAvis_prepare.py:387: DeprecationWarning: .ix is deprecated. Please use .loc for label based indexing or .iloc for positional indexing

I cannot find any solution and to be honest neither have any clue where exactly the problem comes from. I spent hours making sure that the encoding of the csv is utf-8 and exported (from R) and imported (in python) correctly.

What am I doing wrong or where else could I look at? Cheers!

Solution

DeprecationWarining is exactly that - warning about a feature being deprecated which is supposed to prompt the user to use some other functionality instead to maintain the compatibility in the future. So in your case I would just watch for the update of libraries that you use.

Starting with the last warning it look like it is originating from pandas and has been logged against pyLDAvis here.

The remaining ones come from pyparsing module but it does not seem that you are importing it explicitly. Maybe one of the libraries you use has a dependency and uses some relatively old and deprecated functionality. To eradicate the warning for the start I would check if upgrading does not help. Good luck!

Answered By - sophros

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, July 17, 2022

[FIXED] Python LDA gensim "DeprecationWarning: invalid escape sequence"

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels