I had this same problem. Go into a python shell and type:
>>>
import nltk
>>>
nltk.download()
You can download punkt
package like this.
import nltk
nltk.download('punkt')
from nltk
import word_tokenize, sent_tokenize
This is also recommended in the error message in more recent versions:
LookupError:
**
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>>
import nltk >>>
nltk.download('punkt')
Searched in:
-'/root/nltk_data' -
'/usr/share/nltk_data' -
'/usr/local/share/nltk_data' -
'/usr/lib/nltk_data' -
'/usr/local/lib/nltk_data' -
'/usr/nltk_data' -
'/usr/lib/nltk_data' -
'' **
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
If you do not pass any argument to the download
function, it downloads all packages i.e chunkers
, grammars
, misc
, sentiment
, taggers
, corpora
, help
, models
, stemmers
, tokenizers
.
nltk.download()
This is what worked for me just now:
# Do this in a separate python interpreter session, since you only have to do it once import nltk nltk.download('punkt') # Do this in your ipython notebook or analysis script from nltk.tokenize import word_tokenize sentences = [ "Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.", "Professor Plum has a green plant in his study.", "Miss Scarlett watered Professor Plum's green plant while he was away from his office last week." ] sentences_tokenized = [] for s in sentences: sentences_tokenized.append(word_tokenize(s))
sentences_tokenized is a list of a list of tokens:
[
['Mr.', 'Green', 'killed', 'Colonel', 'Mustard', 'in', 'the', 'study', 'with', 'the', 'candlestick', '.', 'Mr.', 'Green', 'is', 'not', 'a', 'very', 'nice', 'fellow', '.'],
['Professor', 'Plum', 'has', 'a', 'green', 'plant', 'in', 'his', 'study', '.'],
['Miss', 'Scarlett', 'watered', 'Professor', 'Plum', "'s", 'green', 'plant', 'while', 'he', 'was', 'away', 'from', 'his', 'office', 'last', 'week', '.']
]
From bash command line, run:
$ python - c "import nltk; nltk.download('punkt')"
This Works for me:
>>>
import nltk
>>>
nltk.download()
Step 3: copy paste following code and execute.
from nltk.data
import load
from nltk.tokenize.treebank
import TreebankWordTokenizer
sentences = [
"Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.",
"Professor Plum has a green plant in his study.",
"Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."
]
tokenizer = load('file:C:/english.pickle')
treebank_word_tokenize = TreebankWordTokenizer().tokenize
wordToken = []
for sent in sentences:
subSentToken = []
for subSent in tokenizer.tokenize(sent):
subSentToken.extend([token
for token in treebank_word_tokenize(subSent)
])
wordToken.append(subSentToken)
for token in wordToken:
print token
Simple nltk.download() will not solve this issue. I tried the below and it worked for me:,nltk.download() will not solve this issue. I tried the below and it worked for me:,in the nltk folder create a tokenizers folder and copy your punkt folder into tokenizers folder.,NOTE: If you don’t need to load in the data at runtime or bundle the data with your code, it would be best to create your nltk_data folders at the built-in locations that nltk looks for.
When trying to load the punkt
tokenizer…
import nltk.data
tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')
> LookupError:
>
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** *
>
Resource 'tokenizers/punkt/english.pickle'
not found.Please use the NLTK Downloader to obtain the resource: nltk.download().Searched in:
>
-'C:\\Users\\Martinos/nltk_data' >
-'C:\\nltk_data' >
-'D:\\nltk_data' >
-'E:\\nltk_data' >
-'E:\\Python26\\nltk_data' >
-'E:\\Python26\\lib\\nltk_data' >
-'C:\\Users\\Martinos\\AppData\\Roaming\\nltk_data' >
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
I had this same problem. Go into a python shell and type:
>>>
import nltk
>>>
nltk.download()
import nltk
nltk.download('punkt')
from nltk
import word_tokenize, sent_tokenize
This is also recommended in the error message in more recent versions:
LookupError:
**
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
Resource punkt not found.
Please use the NLTK Downloader to obtain the resource:
>>>
import nltk >>>
nltk.download('punkt')
Searched in:
-'/root/nltk_data' -
'/usr/share/nltk_data' -
'/usr/local/share/nltk_data' -
'/usr/lib/nltk_data' -
'/usr/local/lib/nltk_data' -
'/usr/nltk_data' -
'/usr/lib/nltk_data' -
'' **
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
If you do not pass any argument to the download
function, it downloads all packages i.e chunkers
, grammars
, misc
, sentiment
, taggers
, corpora
, help
, models
, stemmers
, tokenizers
.
nltk.download()
This is what worked for me just now:
# Do this in a separate python interpreter session, since you only have to do it once import nltk nltk.download('punkt') # Do this in your ipython notebook or analysis script from nltk.tokenize import word_tokenize sentences = [ "Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.", "Professor Plum has a green plant in his study.", "Miss Scarlett watered Professor Plum's green plant while he was away from his office last week." ] sentences_tokenized = [] for s in sentences: sentences_tokenized.append(word_tokenize(s))
[
['Mr.', 'Green', 'killed', 'Colonel', 'Mustard', 'in', 'the', 'study', 'with', 'the', 'candlestick', '.', 'Mr.', 'Green', 'is', 'not', 'a', 'very', 'nice', 'fellow', '.'],
['Professor', 'Plum', 'has', 'a', 'green', 'plant', 'in', 'his', 'study', '.'],
['Miss', 'Scarlett', 'watered', 'Professor', 'Plum', "'s", 'green', 'plant', 'while', 'he', 'was', 'away', 'from', 'his', 'office', 'last', 'week', '.']
]
$ python - c "import nltk; nltk.download('punkt')"
This Works for me:
>>>
import nltk
>>>
nltk.download()
The main reason why you see that error is nltk couldn't find punkt package. Due to the size of nltk suite, all available packages are not downloaded by default when one installs it.,If you do not pass any argument to the download function, it downloads all packages i.e chunkers, grammars, misc, sentiment, taggers, corpora, help, models, stemmers, tokenizers.,Then an installation window appears. Go to the 'Models' tab and select 'punkt' from under the 'Identifier' column. Then click Download and it will install the necessary files. Then it should work!,The above function saves packages to a specific directory. You can find that directory location from comments here. https://github.com/nltk/nltk/blob/67ad86524d42a3a86b1f5983868fd2990b59f1ba/nltk/downloader.py#L1051
I had this same problem. Go into a python shell and type:
>>>
import nltk >>> nltk.download()
You can download punkt
package like this.
import nltk nltk.download('punkt') from nltk
import word_tokenize, sent_tokenize
This is also recommended in the error message in more recent versions:
LookupError: ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** Resource punkt not found.Please use the NLTK Downloader to obtain the resource: >>>
import nltk >>> nltk.download('punkt') Searched in: -'/root/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' - '/usr/nltk_data' - '/usr/lib/nltk_data' - '' ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
If you do not pass any argument to the download
function, it downloads all packages i.e chunkers
, grammars
, misc
, sentiment
, taggers
, corpora
, help
, models
, stemmers
, tokenizers
.
nltk.download()
This is what worked for me just now:
# Do this in a separate python interpreter session, since you only have to do it once import nltk nltk.download('punkt') # Do this in your ipython notebook or analysis script from nltk.tokenize import word_tokenize sentences = ["Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.", "Professor Plum has a green plant in his study.", "Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."] sentences_tokenized = [] for s in sentences: sentences_tokenized.append(word_tokenize(s))
sentences_tokenized is a list of a list of tokens:
[
['Mr.', 'Green', 'killed', 'Colonel', 'Mustard', 'in', 'the', 'study', 'with', 'the', 'candlestick', '.', 'Mr.', 'Green', 'is', 'not', 'a', 'very', 'nice', 'fellow', '.'],
['Professor', 'Plum', 'has', 'a', 'green', 'plant', 'in', 'his', 'study', '.'],
['Miss', 'Scarlett', 'watered', 'Professor', 'Plum', "'s", 'green', 'plant', 'while', 'he', 'was', 'away', 'from', 'his', 'office', 'last', 'week', '.']
]
From bash command line, run:
$ python - c "import nltk; nltk.download('punkt')"
This Works for me:
>>>
import nltk >>> nltk.download()
The optional arguments start and end are interpreted as in the slice notation and are used to limit the search to a particular subsequence of the list. The returned index is computed relative to the beginning of the full sequence rather than the start argument.,As per the error message, I logged into python shell from my unix machine then I used the below commands:,A call to index searches through the list in order until it finds a match, and stops there. If you expect to need indices of more matches, you should use a list comprehension, or generator expression.,Most places where I once would have used index, I now use a list comprehension or generator expression because they"re more generalizable. So if you"re considering reaching for index, take a look at these excellent Python features.
My Code:
import nltk.data
tokenizer = nltk.data.load("nltk:tokenizers/punkt/english.pickle")
ERROR Message:
[[email protected] sentiment]$ python mapper_local_v1.0.py
Traceback (most recent call last):
File "mapper_local_v1.0.py", line 16, in <module>
tokenizer = nltk.data.load("nltk:tokenizers/punkt/english.pickle")
File "/usr/lib/python2.6/site-packages/nltk/data.py", line 774, in load
opened_resource = _open(resource_url)
File "/usr/lib/python2.6/site-packages/nltk/data.py", line 888, in _open
return find(path_, path + [""]).open()
File "/usr/lib/python2.6/site-packages/nltk/data.py", line 618, in find
raise LookupError(resource_not_found)
LookupError:
Resource u"tokenizers/punkt/english.pickle" not found. Please
use the NLTK Downloader to obtain the resource:
>>>nltk.download()
Searched in:
- "/home/ec2-user/nltk_data"
- "/usr/share/nltk_data"
- "/usr/local/share/nltk_data"
- "/usr/lib/nltk_data"
- "/usr/local/lib/nltk_data"
- u""
As per the error message, I logged into python shell from my unix machine then I used the below commands:
import nltk
nltk.download()
list.index(x[, start[, end]])
An index
call checks every element of the list in order, until it finds a match. If your list is long, and you don"t know roughly where in the list it occurs, this search could become a bottleneck. In that case, you should consider a different data structure. Note that if you know roughly where to find the match, you can give index
a hint. For instance, in this snippet, l.index(999_999, 999_990, 1_000_000)
is roughly five orders of magnitude faster than straight l.index(999_999)
, because the former only has to search 10 entries, while the latter searches a million:
>>>
import timeit
>>>
timeit.timeit("l.index(999_999)", setup = "l = list(range(0, 1_000_000))", number = 1000)
9.356267921015387
>>>
timeit.timeit("l.index(999_999, 999_990, 1_000_000)", setup = "l = list(range(0, 1_000_000))", number = 1000)
0.0004404920036904514
>>> ["foo", "bar", "baz"].index("bar")
1
list.index(x[, start[, end]])
An index
call checks every element of the list in order, until it finds a match. If your list is long, and you don"t know roughly where in the list it occurs, this search could become a bottleneck. In that case, you should consider a different data structure. Note that if you know roughly where to find the match, you can give index
a hint. For instance, in this snippet, l.index(999_999, 999_990, 1_000_000)
is roughly five orders of magnitude faster than straight l.index(999_999)
, because the former only has to search 10 entries, while the latter searches a million:
>>>
import timeit
>>>
timeit.timeit("l.index(999_999)", setup = "l = list(range(0, 1_000_000))", number = 1000)
9.356267921015387
>>>
timeit.timeit("l.index(999_999, 999_990, 1_000_000)", setup = "l = list(range(0, 1_000_000))", number = 1000)
0.0004404920036904514
A call to index
results in a ValueError
if the item"s not present.
>>> [1, 1].index(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 2 is not in list
list.index(x[, start[, end]])
One thing that is really helpful in learning Python is to use the interactive help function:
>>> help(["foo", "bar", "baz"])
Help on list object:
class list(object)
...
|
|
index(...) |
L.index(value, [start, [stop]]) - > integer--
return first index of value |
The majority of answers explain how to find a single index, but their methods do not return multiple indexes if the item is in the list multiple times. Use enumerate()
:
for i, j in enumerate(["foo", "bar", "baz"]):
if j == "bar":
print(i)
As a list comprehension:
[i
for i, j in enumerate(["foo", "bar", "baz"]) if j == "bar"
]
Here"s also another small solution with itertools.count()
(which is pretty much the same approach as enumerate):
from itertools import izip as zip, count # izip for maximum efficiency [i for i, j in zip(count(), ["foo", "bar", "baz"]) if j == "bar"]
Here"s my code:
def front_back(a, b):
#++ + your code here++ +
if len(a) % 2 == 0 && len(b) % 2 == 0:
return a[: (len(a) / 2)] + b[: (len(b) / 2)] + a[(len(a) / 2): ] + b[(len(b) / 2): ]
else:
#todo!Not yet done.: P
return
If you did not install the data to one of the above central locations, you will need to set the NLTK_DATA environment variable to specify the location of the data. (On a Windows machine, right click on “My Computer” then select Properties > Advanced > Environment Variables > User Variables > New...),Set your NLTK_DATA environment variable to point to your top level nltk_data folder.,If your web connection uses a proxy server, you should specify the proxy address as follows. In the case of an authenticating proxy, specify a username and password. If the proxy is set to None then this function will attempt to detect the system proxy.,Test the installation: Check that the user environment and privileges are set correctly by logging in to a user account, starting the Python interpreter, and accessing the Brown Corpus (see the previous section).
>>>
import nltk
>>>
nltk.download()
>>> from nltk.corpus
import brown
>>>
brown.words()['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]
>>> nltk.set_proxy('http://proxy.example.com:3128', ('USERNAME', 'PASSWORD')) >>>
nltk.download()
"NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, and an active discussion forum." - from http://www.nltk.org/.,Once installed we need to test NLTK. As listed in the previous section, the first thing to do is if we can import NLTK:,During the test, we may get the following error message:,Now, we'll install NLTK on Ubuntu 14.04. The following steps are from Installing NLTK:
Once installed we need to test NLTK. As listed in the previous section, the first thing to do is if we can import NLTK:
>>>
import nltk
>>> sentence = "" "At eight o'clock on Thursday morning ...Arthur didn 't feel very good.""" >>> tokens = nltk.word_tokenize(sentence) >>> tokens['At', 'eight', "o'clock", 'on', 'Thursday', 'morning', 'Arthur', 'did', "n't", 'feel', 'very', 'good', '.'] >>> tagged = nltk.pos_tag(tokens) >>> tagged[0: 6] [('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'), ('Thursday', 'NNP'), ('morning', 'NN')]
During the test, we may get the following error message:
LookupError:
**
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
Resource 'taggers/maxent_treebank_pos_tagger/english.pickle'
not
found.Please use the NLTK Downloader to obtain the resource:
>>>
nltk.download()
Searched in:
-'/home/k/nltk_data' -
'/usr/share/nltk_data' -
'/usr/local/share/nltk_data' -
'/usr/lib/nltk_data' -
'/usr/local/lib/nltk_data' **
** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** **
I’m implementing a custom package with AI Fabric. In the preprocessing stage I’m using the nltk library to tokenize sentences and delete stop words. I download the required NLTK packages within my python code.,So inside your ML Package create a folder for example nltk_data , download punkt and stopwords package locally into this folder using command:,Then in main.py file (for example as first line of init function), includes this line to add the new directory to nltk path:,Now you won’t need to download data anymore they wil lbe there locally so you won’t have this issue.
So inside your ML Package create a folder for example nltk_data , download punkt and stopwords package locally into this folder using command:
import nltk
nltk.download('punkt', download_dir="<MLPackagedirectory>/nltk_data")
nltk.download('stopwords', download_dir="<MLPackagedirectory>/nltk_data")