using the approach from the linked answer:
import dateutil.parser as dparser
s.apply(lambda x: dparser.parse(x, fuzzy = True).strftime('%Y-%m-%d'))
If you need exception handling, you'll have to define a regular function as lambdas can't handle try/except:
def myparser(x):
try:
return dparser.parse(x, fuzzy = True)
except:
return None
s.apply(lambda x: myparser(x))
This will insert NaT
values for wrong dates (or you can provide a 'default date' if you like):
0 1989 - 10 - 12 1 NaT 2 1987 - 12 - 29 3 1983 - 07 - 12 4 NaT 5 2019 - 05 - 16
Try this, if it can't recognize a row as containing a date it will return 1/1/1 , if the date is not complete with month and date will assume january 1st, but you can change it, by adjust the default.
import pandas as pd
import numpy as np
from datetime
import datetime
from dateutil.parser
import parse
l = ['footballer, born October 1989',
'footballer, born 1900s',
'footballer, born 29 December 1987',
'Brazilian footballer, born 1983',
'31/02/1901',
'16 May 2019'
]
df = pd.Series(l, name = 'strings')
def get_dates(series):
my_list = []
for i in range(len(series)):
for j in range(len(series[i])):
try:
my_list.append(parse(series[i][j: ],
default = datetime(1, 1, 1)).strftime('%Y-%m-%d'))
break
except:
pass
return pd.Series(my_list)
get_dates(df)
0 1989 - 10 - 01
1 0001 - 01 - 01
2 1987 - 12 - 29
3 1983 - 01 - 01
4 1901 - 01 - 02
5 2019 - 05 - 16
dtype: object
Last Updated : 09 May, 2021,GATE CS 2021 Syllabus
Input: test_str = "gfg at 2021-01-04"
Output: 2021 - 01 - 04
Explanation: Date format string found.
Input: test_str = "2021-01-04 for gfg"
Output: 2021 - 01 - 04
Explanation: Date format string found.
Output:
The original string is: gfg at 2021 - 01 - 04 Computed date: 2021 - 01 - 04
Using pandas datetime properties , Datetime as index , How to handle time series data with ease? ,What is the start and end date of the time series data set we are working with?
In[1]: import pandas as pd
In[2]: import matplotlib.pyplot as plt
In[3]: air_quality = pd.read_csv("data/air_quality_no2_long.csv")
In[4]: air_quality = air_quality.rename(columns = {
"date.utc": "datetime"
})
In[5]: air_quality.head()
Out[5]:
city country datetime location parameter value unit
0 Paris FR 2019 - 06 - 21 00: 00: 00 + 00: 00 FR04014 no2 20.0 µg / m³
1 Paris FR 2019 - 06 - 20 23: 00: 00 + 00: 00 FR04014 no2 21.8 µg / m³
2 Paris FR 2019 - 06 - 20 22: 00: 00 + 00: 00 FR04014 no2 26.5 µg / m³
3 Paris FR 2019 - 06 - 20 21: 00: 00 + 00: 00 FR04014 no2 24.9 µg / m³
4 Paris FR 2019 - 06 - 20 20: 00: 00 + 00: 00 FR04014 no2 21.4 µg / m³
In[6]: air_quality.city.unique()
Out[6]: array(['Paris', 'Antwerpen', 'London'], dtype = object)
In[7]: air_quality["datetime"] = pd.to_datetime(air_quality["datetime"])
In[8]: air_quality["datetime"]
Out[8]:
0 2019 - 06 - 21 00: 00: 00 + 00: 00
1 2019 - 06 - 20 23: 00: 00 + 00: 00
2 2019 - 06 - 20 22: 00: 00 + 00: 00
3 2019 - 06 - 20 21: 00: 00 + 00: 00
4 2019 - 06 - 20 20: 00: 00 + 00: 00
...
2063 2019 - 05 - 07 06: 00: 00 + 00: 00
2064 2019 - 05 - 07 04: 00: 00 + 00: 00
2065 2019 - 05 - 07 03: 00: 00 + 00: 00
2066 2019 - 05 - 07 02: 00: 00 + 00: 00
2067 2019 - 05 - 07 01: 00: 00 + 00: 00
Name: datetime, Length: 2068, dtype: datetime64[ns, UTC]
pd.read_csv("../data/air_quality_no2_long.csv", parse_dates = ["datetime"])
In[9]: air_quality["datetime"].min(), air_quality["datetime"].max()
Out[9]:
(Timestamp('2019-05-07 01:00:00+0000', tz = 'UTC'),
Timestamp('2019-06-21 00:00:00+0000', tz = 'UTC'))
What if you like to get the month first and then the year? In this case we will use .dt.strftime in order to produce a column with format: MM/YYYY or any other format.,To start, here is the syntax that you may apply in order extract concatenation of year and month:,In this short guide, I'll show you how to extract Month and Year from a DateTime column in Pandas DataFrame. You can also find how to convert string data to a DateTime. So at the end you will get:,A bit faster solution than step 3 plus a trace of the month and year info will be:
To start, here is the syntax that you may apply in order extract concatenation of year and month:
.dt.to_period('M')
Lets create a DataFrame which has a single column StartDate:
dates = ['2021-08-01', '2021-08-02', '2021-08-03']
df = pd.DataFrame({
'StartDate': dates
})
In order to convert string to Datetime column we are going to use:
df['StartDate'] = pd.to_datetime(df['StartDate'])
result:
0 2021 - 08 1 2021 - 08 2 2021 - 08
What if you like to get the month first and then the year? In this case we will use .dt.strftime
in order to produce a column with format: MM/YYYY
or any other format.
df['StartDate'].dt.strftime('%m/%Y')
If you wanted to convert multiple date columns to String type, put all date column names into a list and use it with astype().,5. Convert All Datetime columns to String Type,Yields below output. Note that on the above DataFrame example, I have used pandas.to_datetime() method to convert the date in string format to datetime type datetime64[ns]. Convert InsertedDate to DateTypeCol column.,3. Use pandas.Series.dt.strftime() to Convert datetime Column Format
# Below are quick example # Convert datetype to string df['ConvertedDate'] = df['DateTypeCol'].astype(str) # Using to_datetime() & astype() df['ConvertedDate'] = pd.to_datetime(df['DateTypeCol'].astype(str), format = '%Y/%m/%d') # Conver DataTime to Different Format df['ConvertedDate'] = df['DateTypeCol'].dt.strftime('%m/%d/%Y') # Using DataFrame.style.format() and lambda function df.style.format({ "DateTypeCol": lambda t: t.strftime("%d/%m/%Y") }) # Convert multiple date columns to string type date_columns = ["date_col1", "date_col2", "date_col3"] df[date_columns] = df[date_columns].astype(str) # Convert all date columns to string type for col in df.select_dtypes(include = ['datetime64']).columns.tolist(): df[col] = df[col].astype(str) # Convert all date columns to string type date_columns = df.select_dtypes(include = ['datetime64']).columns.tolist() df[date_columns] = df[date_columns].astype(str)
Now, let’s create a DataFrame with a few rows and columns, execute these examples and validate results. Our DataFrame contains column names Courses
, Fee
and InsertedDate
.
import pandas as pd technologies = ({ 'Courses': ["Spark", "PySpark", "Hadoop"], 'Fee': [22000, 25000, 23000], 'InsertedDate': ["2021/11/24", "2021/11/25", "2021/11/26"] }) df = pd.DataFrame(technologies) # Use pandas.to_datetime() to change datetime format df['DateTypeCol'] = pd.to_datetime(df.InsertedDate) print(df)
Yields below output. Note that on the above DataFrame example, I have used pandas.to_datetime()
method to convert the date in string format to datetime type datetime64[ns]
. Convert InsertedDate
to DateTypeCol
column.
Courses Fee InsertedDate DateTypeCol 0 Spark 22000 2021 / 11 / 24 2021 - 11 - 24 1 PySpark 25000 2021 / 11 / 25 2021 - 11 - 25 2 Hadoop 23000 2021 / 11 / 26 2021 - 11 - 26
dtype
of column ConvertedDate
will be object
(string
). Yields below output.
Courses Fee InsertedDate DateTypeCol ConvertedDate 0 Spark 22000 2021 / 11 / 24 2021 - 11 - 24 2021 - 11 - 24 1 PySpark 25000 2021 / 11 / 25 2021 - 11 - 25 2021 - 11 - 25 2 Hadoop 23000 2021 / 11 / 26 2021 - 11 - 26 2021 - 11 - 26
You can also try this. This converts the String date to datetime and back to a string. On below example, it converts the InsertDate (String type) values in format %Y/%m/%d
to ConvertedDate with format %Y-%m-%d
# Convert datetime from datetime64[ns] to string type df['ConvertedDate'] = pd.to_datetime(df['InsertedDate'].astype(str), format = '%Y/%m/%d') print(df)