df.describe() does not show all stats for columns of large numbers

  • Last Update :
  • Techknowledgy :
1._
dataset.dtypes

bignum object
dtype: object

For some reason, your column is loaded into pandas as an object. The solution is:

dataset.astype(float).describe()

bignum
count 3.000000e+00
mean 4.304240e+19
std 2.817787e+19
min 1.844674e+19
25 % 2.767012e+19
50 % 3.689349e+19
75 % 5.534023e+19
max 7.378698e+19

Suggestion : 2

I'm trying to generate statistics (among other things) for a list of bignums, but it doesn't work.,It prints the following, but not the statistics I wanted, like standard deviation, mean, etc, like it does with lists of smaller numbers., 5 days ago Web dealing with very big numbers in php; df.describe() does not show all stats for columns of large numbers. 2019-01-12 08:55 iPherian imported from Stackoverflow. python; … , Tags: python , pandas , dataframe , statistics , bignum Answers: 1 | Viewed 1,556 times


import pandas as pd # example numbers dataset = pd.DataFrame(data = [2 ** 64, 2 ** 65, 2 ** 66], columns = ['bignum']) print(dataset.describe())

dataset.dtypes bignum object dtype: object
import pandas as pd # example numbers dataset = pd.DataFrame(data = [2 ** 64, 2 ** 65, 2 ** 66], columns = ['bignum']) print(dataset.describe())
   bignum count 3 unique 3 top36893488147419103232 freq 1
bignum mean...std...min...25 % ...50 % ...75 % ...max...
dataset.dtypes bignum object dtype: object

Suggestion : 3

I'm trying to generate statistics (among other things) for a list of bignums, but it doesn't work.,It prints the following, but not the statistics I wanted, like standard deviation, mean, etc, like it does with lists of smaller numbers.,Cast the column to float to see the statistics you wanted.,→ Python Shopify API output formatted datetime string in django template

I'm trying to generate statistics (among other things) for a list of bignums, but it doesn't work.

import pandas as pd

# example numbers
dataset = pd.DataFrame(data = [2 ** 64, 2 ** 65, 2 ** 66], columns = ['bignum'])
print(dataset.describe())

It prints the following, but not the statistics I wanted, like standard deviation, mean, etc, like it does with lists of smaller numbers.

                      bignum
                      count 3
                      unique 3
                      top 36893488147419103232
                      freq 1

I'd like it to say something like this:

       bignum
       mean...
          std...
          min...
          25 % ...
          50 % ...
          75 % ...
          max...
1._
dataset.dtypes

bignum object
dtype: object

For some reason, your column is loaded into pandas as an object. The solution is:

dataset.astype(float).describe()

bignum
count 3.000000e+00
mean 4.304240e+19
std 2.817787e+19
min 1.844674e+19
25 % 2.767012e+19
50 % 3.689349e+19
75 % 5.534023e+19
max 7.378698e+19

Suggestion : 4

November 5, 2021March 4, 2022

Let’s load a sample dataframe to follow along with:

# Loading a sample Pandas dataframe
from seaborn
import load_dataset

df = load_dataset('penguins')
print(df.head())

# Returns:
   species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 Male
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 Female
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 Female
3 Adelie Torgersen NaN NaN NaN NaN NaN
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 Female

Let’s see what happens when we apply the method with default parameters:

# Running the Pandas dataframe.describe() method with
default parameters
from seaborn
import load_dataset

df = load_dataset('penguins')
print(df.describe())

# Returns
# bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
# count 342.000000 342.000000 342.000000 342.000000
# mean 43.921930 17.151170 200.915205 4201.754386
# std 5.459584 1.974793 14.061714 801.954536
# min 32.100000 13.100000 172.000000 2700.000000
# 25 % 39.225000 15.600000 190.000000 3550.000000
# 50 % 44.450000 17.300000 197.000000 4050.000000
# 75 % 48.500000 18.700000 213.000000 4750.000000
# max 59.600000 21.500000 231.000000 6300.000000

Similarly, if you only wanted to describe a single column, then you could apply the .describe() method to a Pandas series (or column). Let’s see what this looks like:

print(df['body_mass_g'].describe())

# Returns:
   # count 342.000000
# mean 4201.754386
# std 801.954536
# min 2700.000000
# 25 % 3550.000000
# 50 % 4050.000000
# 75 % 4750.000000
# max 6300.000000
# Name: body_mass_g, dtype: float64

Let’s see how we can change the methods behaviour to include all columns:

print(df.describe(include = 'all'))

# Returns
# species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
# count 344 344 342.000000 342.000000 342.000000 342.000000 333
# unique 3 3 NaN NaN NaN NaN 2
# top Adelie Biscoe NaN NaN NaN NaN Male
# freq 152 168 NaN NaN NaN NaN 168
# mean NaN NaN 43.921930 17.151170 200.915205 4201.754386 NaN
# std NaN NaN 5.459584 1.974793 14.061714 801.954536 NaN
# min NaN NaN 32.100000 13.100000 172.000000 2700.000000 NaN
# 25 % NaN NaN 39.225000 15.600000 190.000000 3550.000000 NaN
# 50 % NaN NaN 44.450000 17.300000 197.000000 4050.000000 NaN
# 75 % NaN NaN 48.500000 18.700000 213.000000 4750.000000 NaN
# max NaN NaN 59.600000 21.500000 231.000000 6300.000000 NaN

Let’s load a different dataframe so that we can see how this argument works. We’ll leave the value set to the default and then toggle it to True and see how it changes.

import pandas as pd

df = pd.DataFrame.from_dict({
   'Date': ['2021-12-01', '2021-12-02', '2021-12-03', '2021-12-04', '2021-12-05'],
   'Values': [100, 120, 140, 160, 180]
})

print(df.describe())

# Returns:
   # Values
# count 5.000000
# mean 140.000000
# std 31.622777
# min 100.000000
# 25 % 120.000000
# 50 % 140.000000
# 75 % 160.000000
# max 180.000000

Suggestion : 5

Analyzes both numeric and object series, as well as DataFrame column sets of mixed data types. The output will vary depending on what is provided. Refer to the notes below for more detail.,‘all’ : All columns of the input will be included in the output.,The include and exclude parameters can be used to limit which columns in a DataFrame are analyzed for the output. The parameters are ignored when analyzing a Series.,Whether to treat datetime dtypes as numeric. This affects statistics calculated for the column. For DataFrame input, this also controls whether datetime columns are included by default.

>>> s = pd.Series([1, 2, 3]) >>>
   s.describe()
count 3.0
mean 2.0
std 1.0
min 1.0
25 % 1.5
50 % 2.0
75 % 2.5
max 3.0
dtype: float64
>>> s = pd.Series(['a', 'a', 'b', 'c']) >>>
   s.describe()
count 4
unique 3
top a
freq 2
dtype: object
>>> s = pd.Series([
      ...np.datetime64("2000-01-01"),
      ...np.datetime64("2010-01-01"),
      ...np.datetime64("2010-01-01")
      ...
   ]) >>>
   s.describe(datetime_is_numeric = True)
count 3
mean 2006 - 09 - 01 08: 00: 00
min 2000 - 01 - 01 00: 00: 00
25 % 2004 - 12 - 31 12: 00: 00
50 % 2010 - 01 - 01 00: 00: 00
75 % 2010 - 01 - 01 00: 00: 00
max 2010 - 01 - 01 00: 00: 00
dtype: object
>>> df = pd.DataFrame({
      'categorical': pd.Categorical(['d', 'e', 'f']),
      ...'numeric': [1, 2, 3],
      ...'object': ['a', 'b', 'c']
         ...
   }) >>>
   df.describe()
numeric
count 3.0
mean 2.0
std 1.0
min 1.0
25 % 1.5
50 % 2.0
75 % 2.5
max 3.0
>>> df.describe(include = 'all')
categorical numeric object
count 3 3.0 3
unique 3 NaN 3
top f NaN a
freq 1 NaN 1
mean NaN 2.0 NaN
std NaN 1.0 NaN
min NaN 1.0 NaN
25 % NaN 1.5 NaN
50 % NaN 2.0 NaN
75 % NaN 2.5 NaN
max NaN 3.0 NaN
>>> df.numeric.describe()
count 3.0
mean 2.0
std 1.0
min 1.0
25 % 1.5
50 % 2.0
75 % 2.5
max 3.0
Name: numeric, dtype: float64

Suggestion : 6

September 16, 2021

When pandas describe function is applied to a series object, the result is also returned in the form of series

# Create a Series
numericSeries = pd.Series([1, 4, 6, 53, 2, 2, 1, 1])

# Apply describe
function
numericSeries.describe()
count 8.000000
mean 8.750000
std 17.966238
min 1.000000
25 % 1.000000
50 % 2.000000
75 % 4.500000
max 53.000000
dtype: float64

On applying pandas describe function to a dataframe, the result is also returned as a dataframe . This dataframe will consist of a statistics summary for all the numeric features of the dataframe.

# Create a dataframe
df = pd.DataFrame({
   'Subject_1_Marks': [14, 42, 21, 12, 45],
   'Subject_2_Marks': [32, 43, 23, 50, 21],
   'Subject_3_Marks': [45.0, 34.0, 23.0, 8.0, 21.0],
   'Names': ['Saksham', 'Ayushi', 'Abhishek', 'Saksham', 'Saumya']
})

# Apply describe
function
df.describe()
df.dtypes
Subject_1_Marks int64
Subject_2_Marks int64
Subject_3_Marks float64
Names object
dtype: object

Specifying include='all' will force pandas to generate summaries for all types of features in the dataframe. Some data types like string type don’t have any mean or standard deviation. In such cases, pandas will mark them as NaN.

# describe
function with include = 'all'

df.describe(include = 'all')

Suggestion : 7

A large number of methods collectively compute descriptive statistics and other related operations on DataFrame. Most of these are aggregations like sum(), mean(), but some of them, like sumsum(), produce an object of the same size. Generally speaking, these methods take an axis argument, just like ndarray.{sum, std, ...}, but the axis can be specified by name or integer,The describe() function computes a summary of statistics pertaining to the DataFrame columns.,Functions like abs(), cumprod() throw exception when the DataFrame contains character or string data because such operations cannot be performed.,Functions like sum(), cumsum() work with both numeric and character (or) string data elements without any error. Though n practice, character aggregations are never used generally, these functions do not throw any exception.

1._
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {
   'Name': pd.Series(['Tom', 'James', 'Ricky', 'Vin', 'Steve', 'Smith', 'Jack',
      'Lee', 'David', 'Gasper', 'Betina', 'Andres'
   ]),
   'Age': pd.Series([25, 26, 25, 23, 30, 29, 23, 34, 40, 30, 51, 46]),
   'Rating': pd.Series([4.23, 3.24, 3.98, 2.56, 3.20, 4.6, 3.8, 3.78, 2.98, 4.80, 4.10, 3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print df

Its output is as follows −

    Age Name Rating
    0 25 Tom 4.23
    1 26 James 3.24
    2 25 Ricky 3.98
    3 23 Vin 2.56
    4 30 Steve 3.20
    5 29 Smith 4.60
    6 23 Jack 3.80
    7 34 Lee 3.78
    8 40 David 2.98
    9 30 Gasper 4.80
    10 51 Betina 4.10
    11 46 Andres 3.65
3._
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {
   'Name': pd.Series(['Tom', 'James', 'Ricky', 'Vin', 'Steve', 'Smith', 'Jack',
      'Lee', 'David', 'Gasper', 'Betina', 'Andres'
   ]),
   'Age': pd.Series([25, 26, 25, 23, 30, 29, 23, 34, 40, 30, 51, 46]),
   'Rating': pd.Series([4.23, 3.24, 3.98, 2.56, 3.20, 4.6, 3.8, 3.78, 2.98, 4.80, 4.10, 3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print df.sum()
7._
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {
   'Name': pd.Series(['Tom', 'James', 'Ricky', 'Vin', 'Steve', 'Smith', 'Jack',
      'Lee', 'David', 'Gasper', 'Betina', 'Andres'
   ]),
   'Age': pd.Series([25, 26, 25, 23, 30, 29, 23, 34, 40, 30, 51, 46]),
   'Rating': pd.Series([4.23, 3.24, 3.98, 2.56, 3.20, 4.6, 3.8, 3.78, 2.98, 4.80, 4.10, 3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print df.mean()
9._
import pandas as pd
import numpy as np

#Create a Dictionary of series
d = {
   'Name': pd.Series(['Tom', 'James', 'Ricky', 'Vin', 'Steve', 'Smith', 'Jack',
      'Lee', 'David', 'Gasper', 'Betina', 'Andres'
   ]),
   'Age': pd.Series([25, 26, 25, 23, 30, 29, 23, 34, 40, 30, 51, 46]),
   'Rating': pd.Series([4.23, 3.24, 3.98, 2.56, 3.20, 4.6, 3.8, 3.78, 2.98, 4.80, 4.10, 3.65])
}

#Create a DataFrame
df = pd.DataFrame(d)
print df.std()