"errorMessage": "Parameter validation failed:\nUnknown parameter in input: \"WorkGroup\", must be one of: QueryString, ClientRequestToken, QueryExecutionContext, ResultConfiguration",
query_id = client.start_query_execution(
QueryString = query_string,
QueryExecutionContext = {
'Database': 'sample_db'
},
ResultConfiguration = {
'OutputLocation': 's3://sample_db/output/'
},
WorkGroup = 'workgroup1'
)['QueryExecutionId']
Creates a named query in the specified workgroup. Requires that you have access to the workgroup.,Specifies the ARN of the Athena resource (workgroup or data catalog) to which tags are to be added.,If a query runs in a workgroup and the workgroup overrides client-side settings, then the workgroup's setting for encryption is used. It specifies whether query results must be encrypted, for all queries that run in this workgroup.,Lists the prepared statements in the specified workgroup.
import boto3
client = boto3.client('athena')
response = client.batch_get_named_query(
NamedQueryIds = [
'string',
]
)
{
'NamedQueries': [{
'Name': 'string',
'Description': 'string',
'Database': 'string',
'QueryString': 'string',
'NamedQueryId': 'string',
'WorkGroup': 'string'
}, ],
'UnprocessedNamedQueryIds': [{
'NamedQueryId': 'string',
'ErrorCode': 'string',
'ErrorMessage': 'string'
}, ]
}
response = client.batch_get_prepared_statement(
PreparedStatementNames = [
'string',
],
WorkGroup = 'string'
)
{
'PreparedStatements': [{
'StatementName': 'string',
'QueryStatement': 'string',
'WorkGroupName': 'string',
'Description': 'string',
'LastModifiedTime': datetime(2015, 1, 1)
}, ],
'UnprocessedPreparedStatementNames': [{
'StatementName': 'string',
'ErrorCode': 'string',
'ErrorMessage': 'string'
}, ]
}
response = client.batch_get_query_execution(
QueryExecutionIds = [
'string',
]
)
Specifies information about where and how to save the results of the query execution. If the query runs in a workgroup, then workgroup's settings may override query settings. This affects the query results location. The workgroup settings override is specified in EnforceWorkGroupConfiguration (true/false) in the WorkGroupConfiguration. See WorkGroupConfiguration:EnforceWorkGroupConfiguration.,Runs the SQL query statements contained in the Query. Requires you to have access to the workgroup in which the query ran. Running queries against an external catalog requires GetDataCatalog permission to the catalog. For code samples using the AWS SDK for Java, see Examples and Code Samples in the Amazon Athena User Guide.,A list of values for the parameters in a query. The values are applied sequentially to the parameters in the query in the order in which the parameters occur.,Indicates that something is wrong with the input to the request. For example, a required parameter may be missing or out of range.
Request Syntax
{
"ClientRequestToken": "string",
"ExecutionParameters": ["string"],
"QueryExecutionContext": {
"Catalog": "string",
"Database": "string"
},
"QueryString": "string",
"ResultConfiguration": {
"AclConfiguration": {
"S3AclOption": "string"
},
"EncryptionConfiguration": {
"EncryptionOption": "string",
"KmsKey": "string"
},
"ExpectedBucketOwner": "string",
"OutputLocation": "string"
},
"WorkGroup": "string"
}
Response Syntax
{
"QueryExecutionId": "string"
}
Last Updated November 17, 2021 , 03/30/2022 , 03/31/2022 , 05/03/2022
To create a named query, use the create_named_query() method and assign the required parameters: the Name
of the query, the Database
, and the QueryString
that you want to run on the data stored in S3.
import boto3
client = boto3.client('athena')
response = client.create_named_query(
Name = 'TestQuery',
Database = 'wbc',
QueryString = 'SELECT * FROM "wbc"."thirdpartydata" limit 10;',
)
print(response)
To list available named queries in Athena, you can use the list_named_queries() method and pass some optional parameters such as MaxResults, which allows you to specify the number of queries to return, and the WorkGroup parameter, which sets the workgroup from which the queries are being returned.
import boto3
client = boto3.client('athena')
response = client.list_named_queries(
MaxResults = 10,
WorkGroup = 'primary'
)
print(response)
To get a named query, you need to use the get_named_query() method and assign the required parameter, which is the NamedQueryId
.
import boto3
client = boto3.client('athena')
response = client.get_named_query(
NamedQueryId = 'b1fc5c6e-cd83-451e-b69f-6d7b1ae73795'
)
print(response)
To delete a named query, you can use the delete_named_query() method and pass the required parameter, which is the NamedQueryId
import boto3
client = boto3.client('athena')
response = client.delete_named_query(
NamedQueryId = 'b1fc5c6e-cd83-451e-b69f-6d7b1ae73795'
)
print(response)
To create a prepared statement, use the create_prepared_statement() method and pass the required parameters: the StatmentName
, WorkGroup
name, and QueryStatement
.
import boto3
client = boto3.client('athena')
response = client.create_prepared_statement(
StatementName = 'Test_Statement',
WorkGroup = 'primary',
QueryStatement = 'SELECT * FROM "wbc"."thirdpartydata";',
)
print(response)
Return the filename in S3 where the query results are stored,This is very clear and concise. Have you tried such code in a context of a query for a web page backend? I am curious what performance of something like this look like and if it is only functional for ETL.,Heyy thanks for the question! First comment on the blog. Cheers! 🍺I have not, but off the bat I would recommend against it. Even with simple data sets and queries total response times easily get up to 1-2 seconds. If that's ok though?How real time would it need to be? If possible, I would consider caching the result CSV in S3 and only go back to Athena when you need to refresh the file.,Ilkka PeltolaMarch 27, 2019 at 10:41 AMHeyy thanks for the question! First comment on the blog. Cheers! 🍺I have not, but off the bat I would recommend against it. Even with simple data sets and queries total response times easily get up to 1-2 seconds. If that's ok though?How real time would it need to be? If possible, I would consider caching the result CSV in S3 and only go back to Athena when you need to refresh the file.DeleteRepliesReply
import boto3
import pandas as pd
import io
import re
import time
params = {
'region': 'eu-central-1',
'database': 'databasename',
'bucket': 'your-bucket-name',
'path': 'temp/athena/output',
'query': 'SELECT * FROM tablename LIMIT 100'
}
session = boto3.Session()
def athena_query(client, params):
response = client.start_query_execution(
QueryString = params["query"],
QueryExecutionContext = {
'Database': params['database']
},
ResultConfiguration = {
'OutputLocation': 's3://' + params['bucket'] + '/' + params['path']
}
)
return response
- Dispatch the query to Athena
- Poll the results and once the query is finished
- Return the filename in S3 where the query results are stored
def athena_to_s3(session, params, max_execution = 5):
client = session.client('athena', region_name = params["region"])
execution = athena_query(client, params)
execution_id = execution['QueryExecutionId']
state = 'RUNNING'
while (max_execution > 0 and state in ['RUNNING', 'QUEUED']):
max_execution = max_execution - 1
response = client.get_query_execution(QueryExecutionId = execution_id)
if 'QueryExecution' in response and\
'Status' in response['QueryExecution'] and\ 'State' in response['QueryExecution']['Status']:
state = response['QueryExecution']['Status']['State']
if state == 'FAILED':
return False
elif state == 'SUCCEEDED':
s3_path = response['QueryExecution']['ResultConfiguration']['OutputLocation']
filename = re.findall('.*\/(.*)', s3_path)[0]
return filename
time.sleep(1)
return False