lookup and replace symbols in one dataframe with values from another dataframe across thousands of columns

  • Last Update :
  • Techknowledgy :

You need df.replace() for this usecase, it automatically maps the index and values when replacing so you should get the desired output:

df_new = df1.replace(df2)
print(df_new)

2001 - 10 - 01 2001 - 11 - 01 2001 - 12 - 01
0 0.05 0.04 0.03
1 - 0.04 0.03 0.06
2 0.04 0.01 0.05
3 0.04 0.03 0.01

Example below:

print(df1)

2001 - 10 - 01 2001 - 11 - 01 2001 - 12 - 01
0 AAPL MS AAPL
1 MRK AAPL AMZN
2 AMZN MSFT MRK
3 MSFT AAPL MSFT

print(df2)

2001 - 10 - 01 2001 - 11 - 01 2001 - 12 - 01
AAPL 0.05 0.03 0.03
MSFT 0.04 0.01 0.01
MRK - 0.04 - 0.07 0.05
MS 0.02 0.04 0.08
GS 0.01 0.02 0.10
AMZN 0.04 0.02 0.06

df_new = df1.replace(df2)
print(df_new)

2001 - 10 - 01 2001 - 11 - 01 2001 - 12 - 01
0 0.05 0.04 0.03
1 - 0.04 0.03 0.06
2 0.04 0.01 0.05
3 0.04 0.03 0.01

Example :

>>> df
A B
0 1 400
1 2 500
2 3 600

   >>>
   df2
B C
0 4 7
1 5 8
2 6 9

Using df.update():

>>> df.update(df2)

Result:

>>> df
A B
0 1 4
1 2 5
2 3 6

Suggestion : 2

lookup and replace symbols in one dataframe with values from another dataframe across thousands of columns,Pandas: replace values in one dataframe with values from another dataframe based on two columns,Lookup values from one Dataframe with another dataframe and then creating a new column in df1 based on if the condition is met,Comparing a value from one dataframe with values from columns in another dataframe and getting the data from third column

Example :

>>> df
A B
0 1 400
1 2 500
2 3 600

   >>>
   df2
B C
0 4 7
1 5 8
2 6 9

Using df.update():

>>> df.update(df2)

Result:

>>> df
A B
0 1 4
1 2 5
2 3 6

You need df.replace() for this usecase, it automatically maps the index and values when replacing so you should get the desired output:

df_new = df1.replace(df2)
print(df_new)

2001 - 10 - 01 2001 - 11 - 01 2001 - 12 - 01
0 0.05 0.04 0.03
1 - 0.04 0.03 0.06
2 0.04 0.01 0.05
3 0.04 0.03 0.01

Example below:

print(df1)

2001 - 10 - 01 2001 - 11 - 01 2001 - 12 - 01
0 AAPL MS AAPL
1 MRK AAPL AMZN
2 AMZN MSFT MRK
3 MSFT AAPL MSFT

print(df2)

2001 - 10 - 01 2001 - 11 - 01 2001 - 12 - 01
AAPL 0.05 0.03 0.03
MSFT 0.04 0.01 0.01
MRK - 0.04 - 0.07 0.05
MS 0.02 0.04 0.08
GS 0.01 0.02 0.10
AMZN 0.04 0.02 0.06

df_new = df1.replace(df2)
print(df_new)

2001 - 10 - 01 2001 - 11 - 01 2001 - 12 - 01
0 0.05 0.04 0.03
1 - 0.04 0.03 0.06
2 0.04 0.01 0.05
3 0.04 0.03 0.01

Suggestion : 3

DataTableColumn is a list of the column names which we want to find and replace values in.,The function then converts our FindReplaceTable to a list of find and replace pairs and we iterate through them and apply a Table.ReplaceValue function to each pair.,DataTable is the table that contains the columns which we want to find and replace values in.,FindReplaceTable is a two column table. The first column contains values to find and the second column contains values to replace them with. Each row in the table consists of one pair of find and replace values.

This will then replace every instance of this in the entire column.

= Table.ReplaceValue(# "Changed Type", "Text to find", "Text to replace", Replacer.ReplaceText, {
   "Job Title"
})

M Code For The Query Function

let BulkReplace = (DataTable as table, FindReplaceTable as table, DataTableColumn as list) =>
   let
      //Convert the FindReplaceTable to a list using the Table.ToRows function
      //so we can reference the list with an index number
      FindReplaceList = Table.ToRows(FindReplaceTable),
      //Count number of rows in the FindReplaceTable to determine
      //how many iterations are needed
      Counter = Table.RowCount(FindReplaceTable),
      //Define a function to iterate over our list 
      //with the Table.ReplaceValue function
      BulkReplaceValues = (DataTableTemp, n) =>
      let
         //Replace values using nth item in FindReplaceList
         ReplaceTable = Table.ReplaceValue(
            DataTableTemp,
            //replace null with empty string in nth item
            if FindReplaceList {
               n
            } {
               0
            } = null then ""
            else FindReplaceList {
                  n
               } {
                  0
               },
               if FindReplaceList {
               n
            } {
               1
            } = null then ""
            else FindReplaceList {
                  n
               } {
                  1
               },
               Replacer.ReplaceText,
               DataTableColumn
         )
in
//if we are not at the end of the FindReplaceList
//then iterate through Table.ReplaceValue again
if n = Counter - 1
then ReplaceTable
else @BulkReplaceValues(ReplaceTable, n + 1),
   //Evaluate the sub-function at the first row
   Output = BulkReplaceValues(DataTable, 0)
in
Output
   in
   BulkReplace

= fBulkReplace(# "Changed Type", MyFindReplace, {
   "Job Title"
})

Suggestion : 4

Link the output of one dplyr function to the input of another function with the ‘pipe’ operator %>%.,Add new columns to a data frame that are functions of existing columns with mutate.,Select certain columns in a data frame with the dplyr function select.,Note that now the NA genera are included in the long format data frame. Pivoting wider and then longer can be a useful way to balance out a dataset so that every replicate has the same composition

surveys < -read_csv("data_raw/portal_data_joined.csv")
1._
surveys < -read_csv("data_raw/portal_data_joined.csv")
2._
# > Rows: 34786 Columns: 13
# > ──Column specification────────────────────────────────────────────────────────
# > Delimiter: ","
# > chr(6): species_id, sex, genus, species, taxa, plot_type
# > dbl(7): record_id, month, day, year, plot_id, hindfoot_length, weight
# >
   # > ℹ Use `spec()`
to retrieve the full column specification
for this data.
# > ℹ Specify the column types or set `show_col_types = FALSE`
to quiet this message.
3._
# # inspect the data
str(surveys)
# # inspect the data
str(surveys)
# # preview the data
view(surveys)
select(surveys, plot_id, species_id, weight)
select(surveys, -record_id, -species_id)

Suggestion : 5

Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame. This conversion can be done using SparkSession.read.json on a JSON file.,DataFrame.withColumn method in pySpark supports adding a new column or replacing existing columns of the same name.,With a SparkSession, applications can create DataFrames from an existing RDD, from a Hive table, or from Spark data sources.,Spark SQL can automatically infer the schema of a JSON dataset and load it as a DataFrame. using the read.json() function, which loads data from a directory of JSON files where each line of the files is a JSON object.

import org.apache.spark.sql.SparkSession

val spark = SparkSession
   .builder()
   .appName("Spark SQL basic example")
   .config("spark.some.config.option", "some-value")
   .getOrCreate()

// For implicit conversions like converting RDDs to DataFrames
import spark.implicits._
import org.apache.spark.sql.SparkSession;

SparkSession spark = SparkSession
   .builder()
   .appName("Java Spark SQL basic example")
   .config("spark.some.config.option", "some-value")
   .getOrCreate();
from pyspark.sql
import SparkSession

spark = SparkSession\
   .builder\
   .appName("Python Spark SQL basic example")\
   .config("spark.some.config.option", "some-value")\
   .getOrCreate()
sparkR.session(appName = "R Spark SQL basic example", sparkConfig = list(spark.some.config.option = "some-value"))
val df = spark.read.json("examples/src/main/resources/people.json")

// Displays the content of the DataFrame to stdout
df.show()
// +----+-------+
// | age|   name|
// +----+-------+
// |null|Michael|
// |  30|   Andy|
// |  19| Justin|
// +----+-------+
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;

Dataset<Row> df = spark.read().json("examples/src/main/resources/people.json");

   // Displays the content of the DataFrame to stdout
   df.show();
   // +----+-------+
   // | age| name|
   // +----+-------+
   // |null|Michael|
   // | 30| Andy|
   // | 19| Justin|
   // +----+-------+

Suggestion : 6

In R, the most basic objects available to store data are vectors. As we have seen, complex datasets can usually be broken down into components that are vectors. For example, in a data frame, each column is a vector. Here we learn more about this important class.,Sometimes it is useful to name the entries of a vector. For example, when defining a vector of country codes, we can use the names to connect the two:,We use square brackets to access specific elements of a vector. For the vector codes we defined above, we can access the second element using:,When a function tries to coerce one type to another and encounters an impossible case, it usually gives us a warning and turns the entry into a special value called an NA for “not available”. For example:

But then you remember that the US is a large and diverse country with 50 very different states as well as the District of Columbia (DC).

# > Warning: It is deprecated to specify `guide = FALSE`
to remove a guide.
# > Please use `guide = "none"`
instead.
a < -1
b < -1
c < - -1
a
# > [1] 1
print(a)
# > [1] 1
ls()
# > [1]
"a"
"b"
"c"
"dat"
"img_path"
"murders"
(-b + sqrt(b ^ 2 - 4 * a * c)) / (2 * a)
# > [1] 0.618(-b - sqrt(b ^ 2 - 4 * a * c)) / (2 * a)
# > [1] - 1.62