NLTK库中提供了一些用于评估文本可读性的方法。下面是一个简单的示例代码,演示如何使用NLTK库中的textstat
模块来评估文本的可读性:
import nltk from nltk.tokenize import word_tokenize from nltk.corpus import stopwords from nltk.text import Text from textstat.textstat import textstat # 载入文本 text = "This is a sample text to test readability using NLTK library." # 分词 tokens = word_tokenize(text) # 去除停用词 stop_words = set(stopwords.words('english')) filtered_tokens = [word for word in tokens if word.lower() not in stop_words] # 创建NLTK文本对象 text_nltk = Text(filtered_tokens) # 计算文本可读性指标 flesch_reading_ease = textstat.flesch_reading_ease(text) automated_readability_index = textstat.automated_readability_index(text) coleman_liau_index = textstat.coleman_liau_index(text) # 打印结果 print("Flesch Reading Ease Score:", flesch_reading_ease) print("Automated Readability Index:", automated_readability_index) print("Coleman-Liau Index:", coleman_liau_index)
运行上述代码后,将输出文本的Flesch Reading Ease Score(弗莱施阅读易度分数)、Automated Readability Index(自动可读性指数)和Coleman-Liau Index(科尔曼-利奥指数)等可读性指标。根据这些指标的数值,可以评估文本的可读性水平。