boto3 - download file only if modified since specified timestamp

  • Last Update :
  • Techknowledgy :

I think you should initialize m_timestamp with a time that is before the object's timestamp for sure, and then, to be safe, read it from the response instead of taking it from the time you made the request (otherwise you wouldn't notice if the object has been modified again in the polling interval).

def poll_s3(timestamp):
   response = client.get_object(
      Bucket = bucket_name,
      Key = my_file,
      IfModifiedSince = timestamp
   )
return response

m_timestamp = datetime(2015, 1, 1) # like in the example request in the docs

while True:
   sleep(5)
try:
response = poll_s3(m_timestamp)
m_timestamp = response['LastModified']
print 'Modified at ', m_timestamp
except botocore.exceptions.ClientError as e:
   print 'Not modified since ', m_timestamp

Suggestion : 2

Im trying to download a file from AWS s3 anycodings_python based on the file modified attribute. anycodings_python Currently I cannot use any other method anycodings_python except making repeated calls to the file and anycodings_python downloading it if it appears to have been anycodings_python changed / modified.,The idea is to start with a timestamp and anycodings_python see if the file has been modified since anycodings_python then. If yes then download it and update the anycodings_python original timestamp when the last file was anycodings_python downloaded, if not ignore it and retry in 5 anycodings_python secs,even though the file hasnt been modified in anycodings_python days...,If the IfModifiedSince is to be anycodings_python specified in GMT.

this is what I have :

import boto3
import botocore
from datetime
import datetime
from time
import sleep

session = boto3.Session(profile_name = 'test', region_name = 'us-west-2')
client = session.client('s3')

bucket_name = 'my_bucket'
my_file = 'testfile.txt'

def poll_s3(timestamp):
   response = client.get_object(
      Bucket = bucket_name,
      Key = my_file,
      IfModifiedSince = timestamp
   )
print response

m_timestamp = datetime.now()

while True:
   sleep(5)
try:
poll_s3(m_timestamp)
m_timestamp = datetime.now()
print 'modified at ', m_timestamp
except botocore.exceptions.ClientError as e:
   print 'Not modified at', m_timestamp

However my script keeps printing

modified at 2019 - 09 - 03 7: 37: 46.102198
modified at 2019 - 09 - 03 7: 37: 51.262606
modified at 2019 - 09 - 03 7: 37: 56.455355
modified at 2019 - 09 - 03 7: 38: 01.608554

I think you should initialize anycodings_python m_timestamp with a time that is before anycodings_python the object's timestamp for sure, and anycodings_python then, to be safe, read it from the anycodings_python response instead of taking it from the anycodings_python time you made the request (otherwise you anycodings_python wouldn't notice if the object has been anycodings_python modified again in the polling interval).

def poll_s3(timestamp):
   response = client.get_object(
      Bucket = bucket_name,
      Key = my_file,
      IfModifiedSince = timestamp
   )
return response

m_timestamp = datetime(2015, 1, 1) # like in the example request in the docs

while True:
   sleep(5)
try:
response = poll_s3(m_timestamp)
m_timestamp = response['LastModified']
print 'Modified at ', m_timestamp
except botocore.exceptions.ClientError as e:
   print 'Not modified since ', m_timestamp

Suggestion : 3

How to use Boto3 library in Python to get the list of buckets present in AWS S3?,How to use Boto3 library in Python to get the details of a crawler?,Problem Statement − Use boto3 library in Python to get a list of files from S3, those are modified after a given date timestamp.,How to use Boto3 library in Python to delete an object from S3 using AWS Resource?

The following code gets the list of files from AWS S3 based on the last modified date timestamp −

import boto3
from botocore.exceptions
import ClientError

def list_all_objects_based_on_last_modified(s3_files_path,
      last_modified_timestamp):
   if 's3://' not in s3_files_path:
   raise Exception('Given path is not a valid s3 path.')
session = boto3.session.Session()
s3_resource = session.resource('s3')
bucket_token = s3_files_path.split('/')
bucket = bucket_token[2]
folder_path = bucket_token[3: ]
prefix = ""
for path in folder_path:
   prefix = prefix + path + '/'
try:
result = s3_resource.meta.client.list_objects(Bucket = bucket, Prefix = prefix)
except ClientError as e:
   raise Exception("boto3 client error in list_all_objects_based_on_last_modified function: " + e.__str__())
except Exception as e:
   raise Exception("Unexpected error in list_all_objects_based_on_last_modifiedfunction of s3 helper: " + e.__str__())
      filtered_file_names = []
      for obj in result['Contents']:
      if str(obj["LastModified"]) >= str(last_modified_timestamp):
      full_s3_file = "s3://" + bucket + "/" + obj["Key"] filtered_file_names.append(full_s3_file) return filtered_file_names

      #give a timestamp to fetch test.zip print(list_all_objects_based_on_last_modified("s3://Bucket_1/testfolder", "2021-01-21 13:19:56.986445+00:00")) #give a timestamp no file is modified after that print(list_all_objects_based_on_last_modified("s3://Bucket_1/testfolder", "2021-01-21 13:19:56.986445+00:00"))

Output

#give a timestamp to fetch test.zip[s3: //Bucket_1/testfolder/test.zip]
      #give a timestamp no file is modified after that[]

Suggestion : 4

Copies the object if it has been modified since the specified time., (Boolean) — Returns true when the file is downloaded without any errors. ,Copies the object if it hasn\'t been modified since the specified time., #download_file(destination, options = {}) ⇒ Boolean Returns true when the file is downloaded without any errors.

objectsummary.copy_from({
   acl: "private",
   # accepts private,
   public - read,
   public - read - write,
   authenticated - read,
   aws - exec - read,
   bucket - owner - read,
   bucket - owner - full - control
   cache_control: "CacheControl",
   content_disposition: "ContentDisposition",
   content_encoding: "ContentEncoding",
   content_language: "ContentLanguage",
   content_type: "ContentType",
   copy_source: "CopySource",
   # required
   copy_source_if_match: "CopySourceIfMatch",
   copy_source_if_modified_since: Time.now,
   copy_source_if_none_match: "CopySourceIfNoneMatch",
   copy_source_if_unmodified_since: Time.now,
   expires: Time.now,
   grant_full_control: "GrantFullControl",
   grant_read: "GrantRead",
   grant_read_acp: "GrantReadACP",
   grant_write_acp: "GrantWriteACP",
   metadata: {
      "MetadataKey" => "MetadataValue",
   },
   metadata_directive: "COPY",
   # accepts COPY,
   REPLACE
   tagging_directive: "COPY",
   # accepts COPY,
   REPLACE
   server_side_encryption: "AES256",
   # accepts AES256,
   aws: kms
   storage_class: "STANDARD",
   # accepts STANDARD,
   REDUCED_REDUNDANCY,
   STANDARD_IA,
   ONEZONE_IA,
   INTELLIGENT_TIERING,
   GLACIER,
   DEEP_ARCHIVE,
   OUTPOSTS
   website_redirect_location: "WebsiteRedirectLocation",
   sse_customer_algorithm: "SSECustomerAlgorithm",
   sse_customer_key: "SSECustomerKey",
   sse_customer_key_md5: "SSECustomerKeyMD5",
   ssekms_key_id: "SSEKMSKeyId",
   ssekms_encryption_context: "SSEKMSEncryptionContext",
   copy_source_sse_customer_algorithm: "CopySourceSSECustomerAlgorithm",
   copy_source_sse_customer_key: "CopySourceSSECustomerKey",
   copy_source_sse_customer_key_md5: "CopySourceSSECustomerKeyMD5",
   request_payer: "requester",
   # accepts requester
   tagging: "TaggingHeader",
   object_lock_mode: "GOVERNANCE",
   # accepts GOVERNANCE,
   COMPLIANCE
   object_lock_retain_until_date: Time.now,
   object_lock_legal_hold_status: "ON",
   # accepts ON,
   OFF
   expected_bucket_owner: "AccountId",
   expected_source_bucket_owner: "AccountId",
   use_accelerate_endpoint: false,
})
11
12
13
# File 'aws-sdk-resources/lib/aws-sdk-resources/services/s3/object_summary.rb', line 11

def copy_from(source, options = {})
object.copy_from(source, options)
end
19
20
21
# File 'aws-sdk-resources/lib/aws-sdk-resources/services/s3/object_summary.rb', line 19

def copy_to(target, options = {})
object.copy_to(target, options)
end

objectsummary.delete({
   mfa: "MFA",
   version_id: "ObjectVersionId",
   request_payer: "requester",
   # accepts requester
   bypass_governance_retention: false,
   expected_bucket_owner: "AccountId",
   use_accelerate_endpoint: false,
})

Suggestion : 5

I am currently fetching all the files, and then sorting...but that seems overkill, especially if I only care about the 10 or so most recent files.,I need to fetch a list of items from S3 using Boto3, but instead of returning default sort order (descending) I want it to return it via reverse order.,The filter system seems to only accept the Prefix for s3, nothing else. How do I solve this issue?,What are the differences between numpy arrays and matrices? Which one should I use? Aug 8

I know you can do it via awscli:

aws s3api list - objects--bucket mybucketfoo--query "reverse(sort_by(Contents,&LastModified))"

I tried the following method. Its not 100% optimum, but it gets the job done with the limitations boto3 has as of this time.

s3 = boto3.resource('s3')
my_bucket = s3.Bucket('myBucket')
unsorted = []
for file in my_bucket.objects.filter():
   unsorted.append(file)

files = [obj.key
   for obj in sorted(unsorted, key = get_last_modified,
      reverse = True)
][0: 9]