impyla error when port is specified

  • Last Update :
  • Techknowledgy :

I am using impyla 0.9.0, if I specify port in the connect

conn = impala.dbapi.connect(host = 'n1', port = 21000)

I will get the following error

Traceback (most recent call last):
File "./myquery.py", line 78, in <module>
   main(len(sys.argv), sys.argv)
   File "./myquery.py", line 58, in main
   cur = conn.cursor()
   File "/usr/lib/python2.6/site-packages/impala/dbapi/hiveserver2.py", line 55, in cursor
   rpc.open_session(self.service, user, configuration))
   File "/usr/lib/python2.6/site-packages/impala/_rpc/hiveserver2.py", line 132, in wrapper
   return func(*args, **kwargs)
   File "/usr/lib/python2.6/site-packages/impala/_rpc/hiveserver2.py", line 214, in open_session
   resp = service.OpenSession(req)
   File "/usr/lib/python2.6/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 175, in OpenSession
   return self.recv_OpenSession()
   File "/usr/lib/python2.6/site-packages/impala/_thrift_gen/TCLIService/TCLIService.py", line 191, in recv_OpenSession
   raise x
   thrift.Thrift.TApplicationException: Invalid method name: 'OpenSession'

But it is a valid port.

impala - shell - i n1: 21000
Starting Impala Shell without Kerberos authentication
Connected to n1: 21000
Server version: impalad version 2.1 .1 - cdh5 RELEASE(build 7901877736e29716147 c4804b0841afc4ebc9037)
Welcome to the Impala shell.Press TAB twice to see a list of available commands.

Copyright(c) 2012 Cloudera, Inc.All rights reserved.

(Shell build version: Impala Shell v2 .1 .1 - cdh5(7901877) built on Tue Jan 27 16: 23: 42 PST 2015)[n1: 21000] >

Suggestion : 2

There seems to be different version of thrift-sasl and impyla that work or dont work and it is not easy to figure out these version mismatches. So we finally abandoned impyla and went with pyodbc with cloudera impala odbc driver which is easier to make it work and is working good so far. Check out this link: https://plenium.wordpress.com/2020/05/04/use-pyodbc-with-cloudera-impala-odbc-and-kerberos/,Been getting the same error when I was trying to connect to the impala instance on a kerberized cluster! Any particular reason why we get this??,@JasonBourne - if you have the same issue, here's a GitHub issue discussing it and linking to a pull request to fix it:https://github.com/cloudera/thrift_sasl/issues/28You can see in the commits (here: https://github.com/cloudera/thrift_sasl/commits/master), they are testing a new release for a fix, but it looks like it's not quite done yet. Hopefully soon.,After trying various options and setting timeout=100 in the connect statement, it appears the script queries impala table successfully but every 2nd or 3rd time it fails with the below error:

Tried:

from impala.dbapi
import connectconn = connect(host = 'my.impala.host', port = 21050) cursor = conn.cursor() cursor.execute('SELECT * FROM youval_db.accounts_info LIMIT 10') print cursor.description # prints the result set 's schemaresults = cursor.fetchall()

Also tried with 

conn = connect()
---------------------------------------------------------------------------
HiveServer2Error                          Traceback (most recent call last)
<ipython-input-13-82112a6ffca2> in <module>()
      2 conn = connect(host='myhost', port=21050)
      3 
----> 4 cursor = conn.cursor()
      5 cursor.execute('SELECT * FROM default.testtable')
      6 print (cursor.description)  # prints the result set's schema

/data/opt/anaconda3/lib/python3.7/site-packages/impala/hiveserver2.py in cursor(self, user, configuration, convert_types, dictify, fetch_error)
    122         log.debug('.cursor(): getting new session_handle')
    123 
--> 124         session = self.service.open_session(user, configuration)
    125 
    126         log.debug('HiveServer2Cursor(service=%s, session_handle=%s, '

/data/opt/anaconda3/lib/python3.7/site-packages/impala/hiveserver2.py in open_session(self, user, configuration)
   1062                               username=user,
   1063                               configuration=configuration)
-> 1064         resp = self._rpc('OpenSession', req)
   1065         return HS2Session(self, resp.sessionHandle,
   1066                           resp.configuration,

/data/opt/anaconda3/lib/python3.7/site-packages/impala/hiveserver2.py in _rpc(self, func_name, request)
    990     def _rpc(self, func_name, request):
    991         self._log_request(func_name, request)
--> 992         response = self._execute(func_name, request)
    993         self._log_response(func_name, response)
    994         err_if_rpc_not_ok(response)

/data/opt/anaconda3/lib/python3.7/site-packages/impala/hiveserver2.py in _execute(self, func_name, request)
   1021 
   1022         raise HiveServer2Error('Failed after retrying {0} times'
-> 1023                                .format(self.retries))   1024 
   1025     def _operation(self, kind, request):

HiveServer2Error: Failed after retrying 3 times
/data/opt / anaconda3 / lib / python3 .7 / site - packages / thrift_sasl / __init__.py in open(self)
65
66 def open(self):
   -- - > 67
if not self._trans.isOpen():
   68 self._trans.open()
69

AttributeError: 'TSocket'
object has no attribute 'isOpen'

The hang seems to be in the statement buff = self.sock.recv(sz)

/data/opt / anaconda3 / lib / python3 .7 / site - packages / thriftpy2 / transport / socket.py in read(self, sz)
107
while True:
   108
try:
-- > 109 buff = self.sock.recv(sz)
110 except socket.error as e:
   111
if e.errno == errno.EINTR:

   KeyboardInterrupt:

After trying various options and setting timeout=100 in the connect statement, it appears the script queries impala table successfully but every 2nd or 3rd time it fails with the below error:

/data/opt / anaconda3 / lib / python3 .7 / site - packages / impala / hiveserver2.py in _rpc(self, func_name, request)
992 response = self._execute(func_name, request)
993 self._log_response(func_name, response)
   -- > 994 err_if_rpc_not_ok(response)
995
return response
996

   /
   data / opt / anaconda3 / lib / python3 .7 / site - packages / impala / hiveserver2.py in err_if_rpc_not_ok(resp)
746 resp.status.statusCode != TStatusCode.SUCCESS_WITH_INFO_STATUS and
747 resp.status.statusCode != TStatusCode.STILL_EXECUTING_STATUS):
-- > 748 raise HiveServer2Error(resp.status.errorMessage)
749
750

HiveServer2Error: Invalid query handle: b14cce8e19xxxx: 5 b51463xxxx

Suggestion : 3

These issues can cause incorrect or unexpected results from queries. They typically only arise in very specific circumstances. , Using a CAST() function to convert large literal values to smaller types, or to convert special values such as NaN or Inf, produces values not consistent with other database systems. This could lead to unexpected results from queries. , These issues affect the ability to interchange data between Impala and other database systems. They cover areas such as data types and file formats. , These issues can prevent one or more Impala-related daemons from starting properly.

Impala could encounter a serious error due to resource usage under very high concurrency. The error message is similar to:

F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'

Workaround: To prevent such errors, configure each host running an impalad daemon with the following settings:

echo 2000000 > /proc/sys / kernel / threads - max
echo 2000000 > /proc/sys / kernel / pid_max
echo 8000000 > /proc/sys / vm / max_map_count

Add the following lines in /etc/security/limits.conf:

impala soft nproc 262144
impala hard nproc 262144

An OUTER JOIN query could omit some expected result rows due to a constant such as FALSE in another join clause. For example:

explain SELECT 1 FROM alltypestiny a1
INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col; +
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - +
|
Explain String |
   + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - +
   |
   Estimated Per - Host Requirements: Memory = 1.00 KB VCores = 1 |
   |
   |
   |
   00: EMPTYSET |
   + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - +

In Impala 3.2 and higher, if the following error appears multiple times in a short duration while running a query, it would mean that the connection between the impalad and the HDFS NameNode is in a bad state and hence the impalad would have to be restarted:

"hdfsOpenFile() for <filename> at backend <hostname:port> failed to finish before the <hdfs_operation_timeout_sec> second timeout "
ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL' = 'TRUE');

A table and a database that share the same name can cause a query failure if the table is not readable by Impala, for example, the table was created in Hive in the Open CSV Serde format. The following exception will return:

CAUSED BY: TableLoadingException: Unrecognized table type
for table

Suggestion : 4

I was able to connect to HiveServer2, via a Java client, and so it seems that the connectivity issue is Python/Impyla specific. When I debug/step-into, the code hangs at line 873 of hiveserver2.py.,When I try to run the following code, the client hangs when trying to connect to Hive:,Exactly a year later, still getting this issue. It seems to hang consistently with certain queries (which only return ~200 rows tops) which are near-instant using Database query tools such as DBeaver. Other queries work fine, even when they are more complicated and return more records.,It would be nice to understand this issue if you do figure out – unfortunately I don’t have the bandwidth to help debug it further. If you are able to sort it out (and if it is an impyla bug) please let me know the resolution here. cc @mjacobs

When I try to run the following code, the client hangs when trying to connect to Hive:

from impala.dbapi
import connect

conn = connect(host = 'host_running_hs2_service', port = 10000, user = 'awoolford', password = 'Bzzzzz')
cursor = conn.cursor() < -hangs here
cursor.execute('show tables')
results = cursor.fetchall()
print results

Hang occurs @ TOpenSessionReq

Attempting to open transport (tries_left=2)
Transport opened
Establishing Connection
Connecting to HiveServer2 hostname:25003 with PLAIN authentication mechanism
get_socket: host=hostname port=25003 use_ssl=False ca_cert=None
sock=<thrift.transport.TSocket.TSocket instance at 0x7f765fea0aa0>
   get_transport: socket=<thrift.transport.TSocket.TSocket instance at 0x7f765fea0aa0> host=hostname kerberos_service_name=impala auth_mechanism=PLAIN user=userpassword=fuggetaboutit
      transport=<thrift_sasl.TSaslClientTransport instance at 0x7f765fea0e60> protocol=<thrift.protocol.TBinaryProtocol.TBinaryProtocolAccelerated instance at 0x7f765fea7140> service=<impala._thrift_gen.ImpalaService.ImpalaHiveServer2Service.Client object at 0x7f765fe9dd50>
               HiveServer2Connection(service=<impala.hiveserver2.HS2Service object at 0x7f765fe9dd90>, default_db=co5012_cpi_int)
                  Connection Established
                  Acquiring Cursor
                  Getting a cursor (Impala session)
                  .cursor(): getting new session_handle
                  OpenSession: req=TOpenSessionReq(username='root', password=None, client_protocol=5, configuration=None)
                  Attempting to open transport (tries_left=3)
                  Transport opened

Suggestion : 5

Python client for HiveServer2 implementations (e.g., Impala, Hive) for distributed query engines.,HiveServer2 compliant; works with Impala and Hive, including nested data, HiveServer2 compliant; works with Impala and Hive, including nested data ,Converter to pandas DataFrame, allowing easy integration into the Python data stack (including scikit-learn and matplotlib); but see the Ibis project for a richer experience

Ubuntu:

apt - get install libkrb5 - dev krb5 - user

RHEL/CentOS:

yum install krb5 - libs krb5 - devel krb5 - server krb5 - workstation

Install the latest release with pip:

pip install impyla

or clone the repo:

git clone https: //github.com/cloudera/impyla.git
   cd impyla
python setup.py install

impyla uses the pytest toolchain, and depends on the following environment variables:

export IMPYLA_TEST_HOST = your.impalad.com
export IMPYLA_TEST_PORT = 21050
export IMPYLA_TEST_AUTH_MECH = NOSASL