how to save pandas dataframe into existing pdf from pdfpages

  • Last Update :
  • Techknowledgy :

Just create a plot of the table, then save that. Given a dataframe such as:

import pandas as pd

df = pd.DataFrame()
df['Animal'] = ['Cow', 'Bear']
df['Weight'] = [250, 450]
df['Favorite'] = ['Grass', 'Honey']
df['Least Favorite'] = ['Meat', 'Leaves']

which looks like:

  Animal Weight Favorite Least Favorite
  0 Cow 250 Grass Meat
  1 Bear 450 Honey Leaves

you can plot a table version of it like so:

import matplotlib.pyplot as plt

fig = plt.figure(figsize = (9, 2))
ax = plt.subplot(111)
ax.axis('off')
ax.table(cellText = df.values, colLabels = df.columns, bbox = [0, 0, 1, 1])

Suggestion : 2

Just create a plot of the table, then save that. Given a dataframe such as:,How to to add Background color to a specific column of pandas dataframe and save that colored dataframe into that same csv?,How to save a pandas dataframe into a file when the dataframe is altered?,How to save a pandas dataframe into a string that i can save and retrieve from a single column in sql database?

Just create a plot of the table, then save that. Given a dataframe such as:

import pandas as pd

df = pd.DataFrame()
df['Animal'] = ['Cow', 'Bear']
df['Weight'] = [250, 450]
df['Favorite'] = ['Grass', 'Honey']
df['Least Favorite'] = ['Meat', 'Leaves']

which looks like:

  Animal Weight Favorite Least Favorite
  0 Cow 250 Grass Meat
  1 Bear 450 Honey Leaves

you can plot a table version of it like so:

import matplotlib.pyplot as plt

fig = plt.figure(figsize = (9, 2))
ax = plt.subplot(111)
ax.axis('off')
ax.table(cellText = df.values, colLabels = df.columns, bbox = [0, 0, 1, 1])

Suggestion : 3

After creating a plot, I would do this report.savefig() each time. However, I also want to output dataframes I generated into the Pdf. Essentially I want a report contain plots and queried dataframes all in one place. Is it possible to add a dataframe to the Pdf using the one created with PdfPages and if so, how would I do so? If not, is there another approach that would allow the plots and dataframe to be in once place (without having to save individual components and piecing them together)? Would love any suggestions and examples. Thanks!,It occurred to me that you might want to plot images and tables on the same figure. You can do so to get results like this:,Here's a link to the tutorial that image came from, which has some example code to help get you started.,I have created a pdf that saves several plots created using Matplotlib.

I did the following to create the pdf

from matplotlib.backends.backend_pdf
import PdfPages
report = PdfPages('report.pdf')

Just create a plot of the table, then save that. Given a dataframe such as:

import pandas as pd

df = pd.DataFrame()
df['Animal'] = ['Cow', 'Bear']
df['Weight'] = [250, 450]
df['Favorite'] = ['Grass', 'Honey']
df['Least Favorite'] = ['Meat', 'Leaves']

which looks like:

  Animal Weight Favorite Least Favorite
  0 Cow 250 Grass Meat
  1 Bear 450 Honey Leaves

you can plot a table version of it like so:

import matplotlib.pyplot as plt

fig = plt.figure(figsize = (9, 2))
ax = plt.subplot(111)
ax.axis('off')
ax.table(cellText = df.values, colLabels = df.columns, bbox = [0, 0, 1, 1])

Suggestion : 4

1 week ago How to save Pandas DataFrame as PDF. You can save the Pandas DataFrame as PDF File with the given code. Python. import pandas as pd. import numpy as np. import matplotlib.pyplot as plt. from matplotlib.backends.backend_pdf import PdfPages. , 1 week ago Nov 25, 2019  · Now only we can start the codes. First of all, we import all the library that we installed just now. import pandas as pd from pandas … , 6 days ago Mar 07, 2022  · How to Export Pandas Dataframe to PDF. We will use matplotlib library to export Pandas dataframe to a table first and then use its functionality to export the table to PDF. 1. Import Required Modules. Create an empty python file using a text editor. $ vi pd_to_pdf.py. Add the following lines to it. , 4 days ago In [1]: # import pandas to create dataframe import pandas as pd # import pdfkit to convert the file into pdf import pdfkit # read the excel file as a dataframe df = pd.read_excel('basic_salary_1_P2.xlsx', index_col=0) df.head() Out [1]: Last_Name. Grade.


import pdfkit as pdf config = pdf.configuration(wkhtmltopdf = "C:\Program Files\wkhtmltopdin\wkhtmltopdf.exe") pdf.from_url('http://google.com', 'out.pdf', configuration = config) -- > not working somehow even though I downloaded wkhtmltopdin on several different locations from weasyprint
import HTML HTML(string = pd.read_csv('cor.csv').to_html()).write_pdf("report.pdf") dlopen() failed to load a library: cairo / cairo - 2 / cairo - gobject - 2-- > not working: Tried several times to solve this isseue, but cannot download library

df.to_html()
import pdfkit as pdfconfig = pdf.configuration(wkhtmltopdf = "C:\Program Files\wkhtmltopdin\wkhtmltopdf.exe") pdf.from_url('http://google.com', 'out.pdf', configuration = config) -- > not working somehow even though I downloaded wkhtmltopdin on several different locations from weasyprint
import HTML HTML(string = pd.read_csv('cor.csv').to_html()).write_pdf("report.pdf") dlopen() failed to load a library: cairo / cairo - 2 / cairo - gobject - 2-- > not working: Tried several times to solve this isseue, but cannot download library
df.to_html()
from PyQt4.QtGui
import QTextDocument, QPrinter, QApplication
import sys app = QApplication(sys.argv) doc = QTextDocument() location = "c://apython//Jim//html//notes.html"
html = open(location).read() doc.setHtml(html) printer = QPrinter() printer.setOutputFileName("foo.pdf") printer.setOutputFormat(QPrinter.PdfFormat) printer.setPageSize(QPrinter.A4) printer.setPageMargins(15, 15, 15, 15, QPrinter.Millimeter) doc.print_(printer) print("done!")
import matplotlib.backends.backend_pdf
import matplotlib.pyplot as plt
import pandas as pd d = {
   'x{}'.format(i): range(30) for i in range(10)
}
table = pd.DataFrame(d) fig = plt.figure() ax = fig.add_subplot(111) cell_text = []
for row in range(len(table)): cell_text.append(table.iloc[row]) ax.table(cellText = cell_text, colLabels = table.columns, loc = 'center') ax.axis('off') pdf = matplotlib.backends.backend_pdf.PdfPages("output.pdf") pdf.savefig(fig) pdf.close()

Suggestion : 5

The table is pretty printed with some anycodings_pandas minimal css.,This is a solution with an intermediate anycodings_pandas pdf file.,I did not use pdfkit, because I had some anycodings_pandas problems with it on a headless machine. anycodings_pandas But weasyprint is great.,The pdf conversion is done with anycodings_pandas weasyprint. You need to pip install anycodings_pandas weasyprint.

First plot table with matplotlib then anycodings_pandas generate pdf

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf
import PdfPages

df = pd.DataFrame(np.random.random((10, 3)), columns = ("col 1", "col 2", "col 3"))

#https: //stackoverflow.com/questions/32137396/how-do-i-plot-only-a-table-in-matplotlib
   fig, ax = plt.subplots(figsize = (12, 4))
ax.axis('tight')
ax.axis('off')
the_table = ax.table(cellText = df.values, colLabels = df.columns, loc = 'center')

#https: //stackoverflow.com/questions/4042192/reduce-left-and-right-margins-in-matplotlib-plot
   pp = PdfPages("foo.pdf")
pp.savefig(fig, bbox_inches = 'tight')
pp.close()

Here is how I do it from sqlite database anycodings_pandas using sqlite3, pandas and pdfkit

import pandas as pd
import pdfkit as pdf
import sqlite3

con = sqlite3.connect("baza.db")

df = pd.read_sql_query("select * from dobit", con)
df.to_html('/home/linux/izvestaj.html')
nazivFajla = '/home/linux/pdfPrintOut.pdf'
pdf.from_file('/home/linux/izvestaj.html', nazivFajla)

The simple CSS code saved in same folder anycodings_pandas as ipynb:

/* includes alternating gray and white with on-hover color */

.mystyle {
   font - size: 11 pt;
   font - family: Arial;
   border - collapse: collapse;
   border: 1 px solid silver;

}

.mystyle td, th {
   padding: 5 px;
}

.mystyle tr: nth - child(even) {
      background: #E0E0E0;
   }

   .mystyle tr: hover {
      background: silver;
      cursor: pointer;
   }

The python code:

pdf_filepath = os.path.join(folder,file_pdf)
demo_df = pd.DataFrame(np.random.random((10,3)), columns = ("col 1", "col 2", "col 3"))

table=demo_df.to_html(classes='mystyle')

html_string = f'''
<html>
  <head><title>HTML Pandas Dataframe with CSS</title></head>
  <link rel="stylesheet" type="text/css" href="df_style.css"/>
  <body>
    {table}
  </body>
</html>
'''

HTML(string=html_string).write_pdf(pdf_filepath, stylesheets=["df_style.css"])

The pdf conversion is done with anycodings_pandas weasyprint. You need to pip install anycodings_pandas weasyprint.

# Create a pandas dataframe with demo data:
import pandas as pd
demodata_csv = 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv'
df = pd.read_csv(demodata_csv)

# Pretty print the dataframe as an html table to a file
intermediate_html = '/tmp/intermediate.html'
to_html_pretty(df,intermediate_html,'Iris Data')
# if you do not want pretty printing, just use pandas:
# df.to_html(intermediate_html)

# Convert the html file to a pdf file using weasyprint
import weasyprint
out_pdf= '/tmp/demo.pdf'
weasyprint.HTML(intermediate_html).write_pdf(out_pdf)

# This is the table pretty printer used above:

def to_html_pretty(df, filename='/tmp/out.html', title=''):
    '''
    Write an entire dataframe to an HTML file
    with nice formatting.
    Thanks to @stackoverflowuser2010 for the
    pretty printer see https://stackoverflow.com/a/47723330/362951
    '''
    ht = ''
    if title != '':
        ht += '<h2> %s </h2>\n' % title
    ht += df.to_html(classes='wide', escape=False)

    with open(filename, 'w') as f:
         f.write(HTML_TEMPLATE1 + ht + HTML_TEMPLATE2)

HTML_TEMPLATE1 = '''
<html>
<head>
<style>
  h2 {
    text-align: center;
    font-family: Helvetica, Arial, sans-serif;
  }
  table { 
    margin-left: auto;
    margin-right: auto;
  }
  table, th, td {
    border: 1px solid black;
    border-collapse: collapse;
  }
  th, td {
    padding: 5px;
    text-align: center;
    font-family: Helvetica, Arial, sans-serif;
    font-size: 90%;
  }
  table tbody tr:hover {
    background-color: #dddddd;
  }
  .wide {
    width: 90%; 
  }
</style>
</head>
<body>
'''

HTML_TEMPLATE2 = '''
</body>
</html>
'''

when using Matplotlib, here's how to get anycodings_pandas a prettier table with alternating colors anycodings_pandas for the rows, etc. as well as to anycodings_pandas optionally paginate the PDF:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf
import PdfPages

def _draw_as_table(df, pagesize):
   alternating_colors = [
      ['white'] * len(df.columns), ['lightgray'] * len(df.columns)
   ] * len(df)
alternating_colors = alternating_colors[: len(df)]
fig, ax = plt.subplots(figsize = pagesize)
ax.axis('tight')
ax.axis('off')
the_table = ax.table(cellText = df.values,
   rowLabels = df.index,
   colLabels = df.columns,
   rowColours = ['lightblue'] * len(df),
   colColours = ['lightblue'] * len(df.columns),
   cellColours = alternating_colors,
   loc = 'center')
return fig

def dataframe_to_pdf(df, filename, numpages = (1, 1), pagesize = (11, 8.5)):
   with PdfPages(filename) as pdf:
   nh, nv = numpages
rows_per_page = len(df) // nh
cols_per_page = len(df.columns) // nv
for i in range(0, nh):
   for j in range(0, nv):
   page = df.iloc[(i * rows_per_page): min((i + 1) * rows_per_page, len(df)),
      (j * cols_per_page): min((j + 1) * cols_per_page, len(df.columns))]
fig = _draw_as_table(page, pagesize)
if nh > 1 or nv > 1:
   # Add a part / page number at bottom - center of page
fig.text(0.5, 0.5 / pagesize[0],
   "Part-{}x{}: Page-{}".format(i + 1, j + 1, i * nv + j + 1),
   ha = 'center', fontsize = 8)
pdf.savefig(fig, bbox_inches = 'tight')

plt.close()

Use it as follows:

dataframe_to_pdf(df, 'test_1.pdf')
dataframe_to_pdf(df, 'test_6.pdf', numpages = (3, 2))