parsing string with kb/mb/gb etc into numeric value

  • Last Update :
  • Techknowledgy :

Since GNU coreutils 8.21 (Feb 2013, so not yet present in all distributions), on non-embedded Linux and Cygwin, you can use numfmt. It doesn't produce exactly the same output format (as of coreutils 8.23, I don't think you can get 2 digits after the decimal points).

$ numfmt--to = iec - i--suffix = B--padding = 7 1 177152 48832200 1975684956
1 B
173 KiB
47 MiB
1.9 GiB

I find your code a bit convoluted. Here's a cleaner awk version (the output format isn't exactly identical):

awk '

function human(x) {
   if (x < 1000) {
      return x
   } else {
      x /= 1024
   }
   s = "kMGTEPZY";
   while (x >= 1000 && length(s) > 1) {
      x /= 1024;
      s = substr(s, 2)
   }
   return int(x + 0.5) substr(s, 1, 1)
} {
   sub(/^[0-9]+/, human($1));
   print
}
'

e.g.

printf % s\\ n 5607598768908 | numfmt--to = iec - i
printf %s\\n 5607598768908 | numfmt --to=iec-i
5.2 Ti

In addition, as of coreutils v. 8.24, numfmt can process multiple fields with field range specifications similar to cut, and supports setting the output precision with the --format option
e.g.

numfmt--to = iec - i--field = 2, 4--format = '%.3f' << < 'tx: 180000 rx: 2000000'

Here's a bash-only option, no bc or any other non-builtins, + decimal format and binary units.

# Converts bytes value to human - readable string[$1: bytes value]
bytesToHumanReadable() {
   local i = $ {
      1: -0
   }
   d = ""
   s = 0 S = ("Bytes"
      "KiB"
      "MiB"
      "GiB"
      "TiB"
      "PiB"
      "EiB"
      "YiB"
      "ZiB")
   while ((i > 1024 && s < $ {
         #S[@]
      } - 1));
   do
      printf - v d ".%02d"
   $((i % 1024 * 100 / 1024))
   i = $((i / 1024))
   s = $((s + 1))
   done
   echo "$i$d ${S[$s]}"
}

Examples:

$ bytesToHumanReadable 123456789
117.73 MiB

$ bytesToHumanReadable 1000000000000 # '1TB of storage'
931.32 GiB # 1 TB of storage

$ bytesToHumanReadable
0 Bytes

$ bytesToHumanReadable 9223372036854775807
7.99 EiB

First, check if the units are present:

$ units--check - verbose | grep byte
doing 'byte'

$ units--check - verbose | grep mega
doing 'megalerg'
doing 'mega'

$ units--check - verbose | grep mebi
doing 'mebi'

Given that they are, do a conversion - printf format specifiers are accepted to format the numeric result:

$ units--one - line - o "%.15g"
'20023450 bytes'
'megabytes'
# also--terse
   *
   20.02345
$ units--one - line - o "%.15g"
'20023450 bytes'
'mebibytes' *
19.0958499908447
$ units--one - line - o "%.5g"
'20023450 bytes'
'mebibytes' *
19.096

Code:

bytestohuman() {
   # converts a byte count to a human readable format in IEC binary notation(base - 1024), rounded to two decimal places
   for anything larger than a byte.switchable to padded format and base - 1000
   if desired.
   local L_BYTES = "${1:-0}"
   local L_PAD = "${2:-no}"
   local L_BASE = "${3:-1024}"
   BYTESTOHUMAN_RESULT = $(awk - v bytes = "${L_BYTES}" - v pad = "${L_PAD}" - v base = "${L_BASE}"
      'function human(x, pad, base) {
      if (base != 1024) base = 1000
      basesuf = (base == 1024) ? "iB" : "B"

      s = "BKMGTEPYZ"
      while (x >= base && length(s) > 1) {
         x /= base;
         s = substr(s, 2)
      }
      s = substr(s, 1, 1)

      xf = (pad == "yes") ? ((s == "B") ? "%5d   " : "%8.2f") : ((s == "B") ? "%d" : "%.2f") s = (s != "B") ? (s basesuf) : ((pad == "no") ? s : ((basesuf == "iB") ? (s "  ") : (s " ")))

      return sprintf((xf " %s\n"), x, s)
   }
   BEGIN {
      print human(bytes, pad, base)
   }
   ')
   return $ ?
}

There are a couple of perl modules on CPAN: Format::Human::Bytes and Number::Bytes::Human, the latter one being a bit more complete:

$ echo 100 1000 100000 100000000 |
   perl - M 'Number::Bytes::Human format_bytes' - pe 's/\d{3,}/format_bytes($&)/ge'
100 1000 98 K 96 M

$ echo 100 1000 100000 100000000 |
   perl - M 'Number::Bytes::Human format_bytes' - pe 's/\d{3,}/
format_bytes($ & , bs => 1000, round_style => 'round', precision => 2) / ge '
100 1.00 k 100 k 100 M

And the reverse:

$ echo 100 1.00 k 100 K 100 M 1 Z |
   perl - M 'Number::Bytes::Human parse_bytes' - pe '
s / [\d.] + [kKMGTPEZY] / parse_bytes($ & ) / ge '
100 1024 102400 104857600 1.18059162071741e+21

Suggestion : 2

Post date March 20, 2021

To create our own function, we write:

const formatBytes = (bytes, decimals = 2) => {
   if (bytes === 0) {
      return '0 Bytes';
   }
   const k = 1024;
   const dm = decimals < 0 ? 0 : decimals;
   const sizes = ['Bytes', 'KB', 'MB', 'GB', 'TB', 'PB', 'EB', 'ZB', 'YB'];
   const i = Math.floor(Math.log(bytes) / Math.log(k));
   return parseFloat((bytes / Math.pow(k, i)).toFixed(dm)) + ' ' + sizes[i];
}

console.log(formatBytes(1024))
console.log(formatBytes(1024 * 1024))

Suggestion : 3

In this article, we will discuss different ways to get file size in human-readable formats like Bytes, Kilobytes (KB), MegaBytes (MB), GigaBytes(GB) etc.,We have created a function to convert the bytes into kilobytes (KB), Megabytes (MB) or GigaBytes (GB) i.e. ,If the file does not exist at the given path, then all the above created function to get file size can raise Error. Therefore we should first check if file exists or not, if yes then only check its size, ,Let’s use this function to get the size of a file in bytes,

os.path.getsize(path)

Let’s use this function to get the size of a file in bytes,

import os

def get_file_size_in_bytes(file_path):
   ""
" Get size of file at given path in bytes"
""
size = os.path.getsize(file_path)
return size

file_path = 'big_file.csv'

size = get_file_size_in_bytes(file_path)
print('File size in bytes : ', size)

File size in bytes: 166908268

Let’s use pathlib module to get the size of a file in bytes,

from pathlib
import Path

def get_file_size_in_bytes_3(file_path):
   ""
" Get size of file at given path in bytes"
""
# get file object
file_obj = Path(file_path)
# Get file size from stat object of file
size = file_obj.stat().st_size
return size

file_path = 'big_file.csv'

size = get_file_size_in_bytes_3(file_path)
print('File size in bytes : ', size)

We have created a function to convert the bytes into kilobytes (KB), Megabytes (MB) or GigaBytes (GB) i.e.

import enum

# Enum
for size units
class SIZE_UNIT(enum.Enum):
   BYTES = 1
KB = 2
MB = 3
GB = 4

def convert_unit(size_in_bytes, unit):
   ""
" Convert the size from bytes to other units like KB, MB or GB"
""
if unit == SIZE_UNIT.KB:
   return size_in_bytes / 1024
elif unit == SIZE_UNIT.MB:
   return size_in_bytes / (1024 * 1024)
elif unit == SIZE_UNIT.GB:
   return size_in_bytes / (1024 * 1024 * 1024)
else:
   return size_in_bytes

Suggestion : 4

Important: this formula assumes that units are the last 2 characters of the string that includes both a number and a unit of measure.,To normalize units to Gigabytes (or megabytes, kilobytes, etc.) you can use a clever formula based the MATCH, LEFT, and RIGHT functions. In the example shown, the formula in C5 is:,To get the number, the formula extracts all characters from the left up to but not including the units:,At the core, this formula separates the number part of the size from the unit, then divides the number by the appropriate divisor to normalize to Gigabytes. The divisor is calculated as a power of 10, so the formula reduces to this:

= LEFT(A1, LEN(A1) - 2) / 10 ^ ((MATCH(RIGHT(A1, 2), {
   "PB",
   "TB",
   "GB",
   "MB",
   "KB"
}, 0) - 3) * 3)

To normalize units to Gigabytes (or megabytes, kilobytes, etc.) you can use a clever formula based the MATCH, LEFT, and RIGHT functions. In the example shown, the formula in C5 is:

= LEFT(B5, LEN(B5) - 2) / 10 ^ ((MATCH(RIGHT(B5, 2), {
   "PB",
   "TB",
   "GB",
   "MB",
   "KB"
}, 0) - 3) * 3)
2._
= LEFT(B5, LEN(B5) - 2) / 10 ^ ((MATCH(RIGHT(B5, 2), {
   "PB",
   "TB",
   "GB",
   "MB",
   "KB"
}, 0) - 3) * 3)
= LEFT(B5, LEN(B5) - 2) / 10 ^ ((MATCH(RIGHT(B5, 2), {
   "PB",
   "TB",
   "GB",
   "MB",
   "KB"
}, 0) - 3) * 3)

At the core, this formula separates the number part of the size from the unit, then divides the number by the appropriate divisor to normalize to Gigabytes. The divisor is calculated as a power of 10, so the formula reduces to this:

= number / 10 ^ power
2._
= number / 10 ^ power

To get the number, the formula extracts all characters from the left up to but not including the units:

LEFT(B5, LEN(B5) - 2)

To get "power", the formula matches on the unit in a hard-coded array constant:

MATCH(RIGHT(B5, 2), {
   "PB",
   "TB",
   "GB",
   "MB",
   "KB"
}, 0)
6._
MATCH(RIGHT(B5, 2), {
   "PB",
   "TB",
   "GB",
   "MB",
   "KB"
}, 0)
= number / 10 ^ power
LEFT(B5, LEN(B5) - 2)