Thursday, January 27, 2022

[FIXED] TypeError: 'NoneType' object is not iterable - text summarisation with keras

January 27, 2022 python, spyder, summarization, text, typeerror No comments

Issue

I am new to machine learning, and I am trying to work my way through a tutorial for text summarization using Keras.

I have reached the point of vectorizing the data, however I am getting an error, and I have tried everything I can myself. I really would like to get this program working, and was hoping someone could shed some light into why it is giving me this error and how I can fix it. I did look at previous posts, but none have helped so far, thanks. Here is my code:

#vectorise data
input_texts = []
target_texts = []
input_characters = set()
target_characters = set()

for story in stories:
    input_text = story['story']
    for highlight in story['highlights']:
        target_text = highlight
    target_text = '\t' + target_text + '\n'
    input_texts.append(input_text)
    target_texts.append(target_text)
    for char in input_text:
        if char not in input_characters:
            input_characters.add(char)
    for char in target_text:
        if char not in target_characters:
            target_characters.add(char)

input_characters = sorted(list(input_characters))
target_characters = sorted(list(target_characters))
num_encoder_tokens = len(input_characters)
num_decoder_tokens = len(target_characters)
max_encoder_seq_length = max([len(txt) for txt in input_texts])
max_decoder_seq_length = max([len(txt) for txt in target_texts])
print('Number of samples:', len(input_texts))
print('Number of unique input tokens:', num_encoder_tokens)
print('Number of unique output tokens:', num_decoder_tokens)
print('Max sequence length for inputs:', max_encoder_seq_length)
print('Max sequence length for outputs:', max_decoder_seq_length)

This is the line of code that it is throwing the error on

for highlight in story['highlights']:

This is the code that i used to clean and pickle the data

#remove all unneeded features and null values
reviews = reviews.dropna()
reviews = reviews.drop(['Id','ProductId','UserId','ProfileName','HelpfulnessNumerator','HelpfulnessDenominator', 'Score','Time'], 1)
reviews = reviews.reset_index(drop=True) 
print(reviews.head())

for i in range(5):
    print("Review #",i+1) 

print(reviews.Summary[i]) 
print(reviews.Text[i]) 
print()

#define contractions eg slang words and their correct spellings
contractions = {
        "ain't": "am not",
        "aren't": "are not",
        "can't": "cannot",
        "can't've": "cannot have",
        "'cause": "because",
        "could've": "could have",
        "couldn't": "could not",
        "couldn't've": "could not have",
        "didn't": "did not",
        "doesn't": "does not",
        "don't": "do not",
        "hadn't": "had not",
        "hadn't've": "had not have",
        "hasn't": "has not",
        "haven't": "have not",
        "he'd": "he would",
        "he'd've": "he would have"}

#clean the text of contractions and stop words 
def clean_text(text, remove_stopwords=True): 
    text = text.lower() 
    if True: 
        text = text.split() 
        new_text = []
        for word in text:
            if word in contractions:new_text.append(contractions[word])
            else:
                new_text.append(word)
            text = " ".join(new_text)
            text = re.sub(r'https?:\/\/.*[\r\n]*', '', text, flags=re.MULTILINE)
            text = re.sub(r'\<a href', ' ', text)
            text = re.sub(r'&amp;', '', text)
            text = re.sub(r'[_"\-;%()|+&=*%.,!?:#$@\[\]/]', ' ', text)
            text = re.sub(r'<br />', ' ', text)
            text = re.sub(r'\'', ' ', text)
        if remove_stopwords:
            text = text.split()
            stops = set(stopwords.words("english"))
            text = [w for w in text if not w in stops]
            text = " ".join(text) 
            return text

#clean summaries and texts
clean_summaries = []
for summary in reviews.Summary:
    clean_summaries.append(clean_text(summary, remove_stopwords=False)) 
print("Summaries are complete.")

clean_texts = []
for text in reviews.Text:
    clean_texts.append(clean_text(text))
print("Texts are complete.")

stories = list()
for i, text in enumerate(clean_texts):
    stories.append({'story': text, 'highlights': clean_summaries[i]}) # save to file
dump(stories, open('data/review_dataset.pkl', 'wb'))

Solution

It seems like at least one of your story dictionaries does not have a key-value pair for the key 'highlights'. If this is only true for certain stories, you can simply check if there is a NoneType before iterating. If this is true for all stories, there might be a discrepancy between your code and the data you are working with.

Also, I believe there is an indentation error (might just be wrong SO formatting), but I believe the code after target_text = highlight should be indented once more to the right.

for story in stories:
    input_text = story['story']
    # check for None to make sure you are not iterating over NoneType
    if story['highlights'] is not None:
        for highlight in story['highlights']:
            target_text = highlight
            # I believe the following code should be indented as well
            target_text = '\t' + target_text + '\n'
            input_texts.append(input_text)
            target_texts.append(target_text)
            ...

Answered By - Chris Graf

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, January 27, 2022

[FIXED] TypeError: 'NoneType' object is not iterable - text summarisation with keras

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels