Issue
I am trying to run a code using Python's twint library (Twitter scraper) in Colab.
My code is:
!pip install twint
!pip install nest_asyncio
!pip install pandas
import twint
import nest_asyncio
nest_asyncio.apply()
import time
import pandas as pd
import os
import re
timestr = time.strftime("%Y%m%d")
c = twint.Config()
c.Limit = 1000
c.Lang = "en"
c.Store_csv = True
c.Search = "apple"
c.Output = timestr + "_en_apple.csv"
twint.run.Search(c)
The above code works perfectly in Jupyter on my machine and fetches tweets. However, the same code in Colab results in the following:
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 1.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 8.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 27.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 64.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 125.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 216.0 secs
How can this be fixed in Colab?
Solution
I got the following to work in Google Colab. Installing from requirements.txt is less hassle.
!git clone --depth=1 https://github.com/twintproject/twint.git
!cd /content/twint && pip3 install . -r requirements.txt
import twint
import nest_asyncio
nest_asyncio.apply()
import time
import pandas as pd
import os
import re
timestr = time.strftime("%Y%m%d")
c = twint.Config()
c.Limit = 1000
c.Lang = "en"
c.Store_csv = True
c.Search = "apple"
c.Output = timestr + "_en_apple.csv"
twint.run.Search(c)
Answered By - Andrew Chisholm
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.