how to properly handle datetime comparisons in an entire dataframe with nat values?

  • Last Update :
  • Techknowledgy :

This was a bug that will be fixed in the next release of pandas (0.24.0):

In[1]: import pandas as pd;
pd.__version__
Out[1]: '0.24.0.dev0+1504.g9642fea9c'

In[2]: s = pd.Series([pd.NaT, pd.to_datetime('2018-10-16')])

In[3]: s > pd.to_datetime('2018-10-15')
Out[3]:
   0 False
1 True
dtype: bool

In[4]: s.to_frame() > pd.to_datetime('2018-10-15')
Out[4]:
   0
0 False
1 True

In[5]: df = pd.DataFrame([
      [pd.NaT, pd.to_datetime('2018-10-16')],
      ...: [pd.to_datetime('2018-10-16'), pd.NaT]
   ])
   ...:

   In[6]: df >= pd.to_datetime('2018-10-15')
Out[6]:
   0 1
0 False True
1 True False

In[7]: df.ge(pd.to_datetime('2018-10-15'))
Out[7]:
   0 1
0 False True
1 True False

Suggestion : 2

How to properly handle datetime comparisons in an entire DataFrame with NaT values?,How to get the index value in a dataframe by comparing date with a datetime object in that dataframe?,How to merge two dataframes with datetime index so that the a value from one dataframe will be assigned to a whole period until no new data is given?,How to get the rows in Pandas dataframe with datetime index using another datetime index?

This was a bug that will be fixed in the next release of pandas (0.24.0):

In[1]: import pandas as pd;
pd.__version__
Out[1]: '0.24.0.dev0+1504.g9642fea9c'

In[2]: s = pd.Series([pd.NaT, pd.to_datetime('2018-10-16')])

In[3]: s > pd.to_datetime('2018-10-15')
Out[3]:
   0 False
1 True
dtype: bool

In[4]: s.to_frame() > pd.to_datetime('2018-10-15')
Out[4]:
   0
0 False
1 True

In[5]: df = pd.DataFrame([
      [pd.NaT, pd.to_datetime('2018-10-16')],
      ...: [pd.to_datetime('2018-10-16'), pd.NaT]
   ])
   ...:

   In[6]: df >= pd.to_datetime('2018-10-15')
Out[6]:
   0 1
0 False True
1 True False

In[7]: df.ge(pd.to_datetime('2018-10-15'))
Out[7]:
   0 1
0 False True
1 True False

Suggestion : 3

Suppose you want to use the resample() method to get a daily frequency in each group of your dataframe and wish to complete the missing values with the ffill() method.,Another simple aggregation example is to compute the size of each group. This is included in GroupBy as the size method. It returns a Series whose index are the group names and whose values are the sizes of each group.,The filter method returns a subset of the original object. Suppose we want to take only elements that belong to groups with a group sum greater than 2.,To select from a DataFrame or Series the nth item, use nth(). This is a reduction method, and will return a single row (or no row) per group if you pass an int for n:

SELECT Column1, Column2, mean(Column3), sum(Column4)
FROM SomeTable
GROUP BY Column1, Column2
In[1]: df = pd.DataFrame(
      ...: [
         ...: ("bird", "Falconiformes", 389.0),
         ...: ("bird", "Psittaciformes", 24.0),
         ...: ("mammal", "Carnivora", 80.2),
         ...: ("mammal", "Primates", np.nan),
         ...: ("mammal", "Carnivora", 58),
         ...:
      ],
      ...: index = ["falcon", "parrot", "lion", "monkey", "leopard"],
      ...: columns = ("class", "order", "max_speed"),
      ...: )
   ...:

   In[2]: df
Out[2]:
   class order max_speed
falcon bird Falconiformes 389.0
parrot bird Psittaciformes 24.0
lion mammal Carnivora 80.2
monkey mammal Primates NaN
leopard mammal Carnivora 58.0

#
default is axis = 0
In[3]: grouped = df.groupby("class")

In[4]: grouped = df.groupby("order", axis = "columns")

In[5]: grouped = df.groupby(["class", "order"])
In[6]: df = pd.DataFrame(
      ...: {
         ...: "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
         ...: "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
         ...: "C": np.random.randn(8),
         ...: "D": np.random.randn(8),
         ...:
      }
      ...: )
   ...:

   In[7]: df
Out[7]:
   A B C D
0 foo one 0.469112 - 0.861849
1 bar one - 0.282863 - 2.104569
2 foo two - 1.509059 - 0.494929
3 bar three - 1.135632 1.071804
4 foo two 1.212112 0.721555
5 bar two - 0.173215 - 0.706771
6 foo one 0.119209 - 1.039575
7 foo three - 1.044236 0.271860
In[8]: grouped = df.groupby("A")

In[9]: grouped = df.groupby(["A", "B"])
In[10]: df2 = df.set_index(["A", "B"])

In[11]: grouped = df2.groupby(level = df2.index.names.difference(["B"]))

In[12]: grouped.sum()
Out[12]:
   C D
A
bar - 1.591710 - 1.739537
foo - 0.752861 - 1.402938
In[13]: def get_letter_type(letter):
   ....: if letter.lower() in 'aeiou':
   ....: return 'vowel'
      ....:
      else:
         ....: return 'consonant'
            ....:

            In[14]: grouped = df.groupby(get_letter_type, axis = 1)

Suggestion : 4

Last Updated : 01 Sep, 2021,GATE CS 2021 Syllabus

1._
 df.dropna()

It is also possible to drop rows with NaN values with regard to particular columns using the following statement: 

df.dropna(subset, inplace = True)

Note: We can also reset the indices using the method reset_index() 

df = df.reset_index(drop = True)