loading a requests text response into a pandas dataframe

  • Last Update :
  • Techknowledgy :

Try this

import requests
import pandas as pd
import io

urlData = requests.get(url).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))

I think you can use read_csv with url:

pd.read_csv(url)

filepath_or_buffer : str, pathlib.Path, py._path.local.LocalPath or any object with a read() method (such as a file handle or StringIO)

The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv

import pandas as pd
import io
import requests

url = r 'http://...'
r = requests.get(url)
df = pd.read_csv(io.StringIO(r))

If it doesnt work, try update last line:

import pandas as pd
import io
import requests

url = r 'http://...'
r = requests.get(url)
df = pd.read_csv(io.StringIO(r.text))

Using "read_csv with url" worked:

import requests, csv
import pandas as pd
url = 'https://arte.folha.uol.com.br/ciencia/2020/coronavirus/csv/mundo/dados-bra.csv'
corona_bra = pd.read_csv(url)
print(corona_bra.head())

Suggestion : 2

Let us first import the necessary packages "requests and pandas".,Reading json data in Python is very easy. Json data can be read from a file or it could be a json web link. Let us first try to read the json from a web link.,Let us import the covid19 timeseries data from json link pomber.github.io/covid19/timeseries.json using requests,The easiest way is to just use pd.DataFrame.from_dict method. Let us try it and see what we get.

import requests
import pandas as pd
data = requests.get('https://pomber.github.io/covid19/timeseries.json')
type(data)
requests.models.Response
jsondata = data.json()
len(jsondata)

Suggestion : 3

Last Updated : 04 Sep, 2021,GATE CS 2021 Syllabus

Save above file as request.py and run using 

Python request.py

There are many libraries to make an HTTP request in Python, which are httplib, urllib, httplib2, treq, etc., but requests is the one of the best with cool features. If any attribute of requests shows NULL, check the status code using below attribute.  

requests.status_code

Suggestion : 4

Trying to load text from a requests response anycodings_pandas into a Pandas dataframe.,The text is separated by semicolons. Domain anycodings_pandas and Url (first line) should be the name of anycodings_pandas the pandas columns. Everything else will be anycodings_pandas rows in the dataframe.,You can either save your data on disk anycodings_python first and then load it with pandas, or anycodings_python use StringIO:,pd.read_csv takes a file or a buffer as anycodings_python input, not a plain string directly.

Trying to load text from a requests response anycodings_pandas into a Pandas dataframe.

url = "https://api.semrush.com/"

parameters = {
   "type": "phrase_organic",
   "key": "*****",
   "phrase": phrase,
   "database": "us",
   "display_limit": 2,
   "export_columns": "Dn,Ur"
}

response = requests.get(url, params = parameters)
urldata = response.text

dF = pd.read_csv(urldata)

The response text looks like this...

Domain;
Url
facebook.com;
https: //facebook.com/home
   instagram.com;
https: //instagram.com/home

You can either save your data on disk anycodings_python first and then load it with pandas, or anycodings_python use StringIO:

import pandas as pd
from io
import StringIO

pd.read_csv(StringIO(urldata), sep = ';')

Suggestion : 5

Duplicate columns will be specified as ‘X’, ‘X.1’…’X.N’, rather than ‘X’…’X’. Passing in False will cause data to be overwritten if there are duplicate names in the columns.,If the file or header contains duplicate names, pandas will by default distinguish between them so as to prevent overwriting data:,Passing a min_itemsize dict will cause all passed columns to be created as data_columns automatically., Duplicate names parsing Filtering columns ( usecols )

In[1]: import pandas as pd

In[2]: from io
import StringIO

In[3]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In[4]: pd.read_csv(StringIO(data))
Out[4]:
   col1 col2 col3
0 a b 1
1 a b 2
2 c d 3

In[5]: pd.read_csv(StringIO(data), usecols = lambda x: x.upper() in ["COL1", "COL3"])
Out[5]:
   col1 col3
0 a 1
1 a 2
2 c 3
In[6]: data = "col1,col2,col3\na,b,1"

In[7]: df = pd.read_csv(StringIO(data))

In[8]: df.columns = [f "pre_{col}"
   for col in df.columns
]

In[9]: df
Out[9]:
   pre_col1 pre_col2 pre_col3
0 a b 1
In[10]: data = "col1,col2,col3\na,b,1\na,b,2\nc,d,3"

In[11]: pd.read_csv(StringIO(data))
Out[11]:
   col1 col2 col3
0 a b 1
1 a b 2
2 c d 3

In[12]: pd.read_csv(StringIO(data), skiprows = lambda x: x % 2 != 0)
Out[12]:
   col1 col2 col3
0 a b 2
In[13]: import numpy as np

In[14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In[15]: print(data)
a, b, c, d
1, 2, 3, 4
5, 6, 7, 8
9, 10, 11

In[16]: df = pd.read_csv(StringIO(data), dtype = object)

In[17]: df
Out[17]:
   a b c d
0 1 2 3 4
1 5 6 7 8
2 9 10 11 NaN

In[18]: df["a"][0]
Out[18]: '1'

In[19]: df = pd.read_csv(StringIO(data), dtype = {
   "b": object,
   "c": np.float64,
   "d": "Int64"
})

In[20]: df.dtypes
Out[20]:
   a int64
b object
c float64
d Int64
dtype: object
In [21]: data = "col_1\n1\n2\n'A'\n4.22"

In [22]: df = pd.read_csv(StringIO(data), converters={"col_1": str})

In [23]: df
Out[23]: 
  col_1
0     1
1     2
2   'A'
3  4.22

In [24]: df["col_1"].apply(type).value_counts()
Out[24]: 
<class 'str'>    4
Name: col_1, dtype: int64
In [25]: df2 = pd.read_csv(StringIO(data))

In [26]: df2["col_1"] = pd.to_numeric(df2["col_1"], errors="coerce")

In [27]: df2
Out[27]:
col_1
0 1.00
1 2.00
2 NaN
3 4.22

In [28]: df2["col_1"].apply(type).value_counts()
Out[28]:
<class 'float'> 4
   Name: col_1, dtype: int64

Suggestion : 6

Part one of this series focuses on requesting and wrangling HTML using two of the most popular Python libraries for web scraping: requests and BeautifulSoup,These sorts of things will be addressed later when we build more complex scrapers, but feel free to let me know in the comments of anything in particular you're interested in learning about.,Right now, the easiest way to get all pages is just to manually make a list of these three pages and loop over them. If we were working on a project with thousands of pages we might build a more automated way of constructing/finding the next URLs, but for now this works.,Since there's nothing in their robots.txt that disallows us from scraping this section of the site, I'm assuming it's okay to go ahead and extract this data for our project. Let's request the this first page:

def save_html(html, path):
   with open(path, 'wb') as f:
   f.write(html)

save_html(r.content, 'google_com')
def open_html(path):
   with open(path, 'rb') as f:
   return f.read()

html = open_html('google_com')
User - agent: *
   Crawl - delay: 10
Allow: /pages/
Disallow: /scripts/

# more stuff
Disallow: *
   Allow: /pages/
import requests

url = 'https://www.allsides.com/media-bias/media-bias-ratings'

r = requests.get(url)

print(r.content[: 100])
b'
<!DOCTYPE html>\n<!--[if IEMobile 7]><html class="iem7"  lang="en" dir="ltr"><![endif]-->\n<!--[if lte'

Suggestion : 7

First load the json data with Pandas read_json method, then it’s loaded into a Pandas DataFrame.,Read json string files in pandas read_json(). You can do this for URLS, files, compressed files and anything that’s in json format. In this post, you will learn how to do that with Python.,A DataFrame can be saved as a json file. To do so, use the method to_json(filename).If you want to save to a json file, you can do the following:,If the json data is stored in a file, you can load it into a DataFrame.

12345678910
# load pandas and json modules
import pandas as pdimport json # json string s = '{"col1":{"row1":1,"row2":2,"row3":3},"col2":{"row1":"x","row2":"y","row3":"z"}}'
# read json to data frame df = pd.read_json(s) print(df)
1234567
import requestsfrom pandas.io.json
import json_normalizeimport pandas as pdurl = "https://api.exchangerate-api.com/v4/latest/USD"
df = pd.read_json(url) print(df)
1234
import pandas as pdimport jsondf = pd.DataFrame([1, 2, 3]) df.to_json('example.json')