merge/zip two series into ndarray of ndarray

  • Last Update :
  • Techknowledgy :

Zip here is not necessary, for better performance use numpy or pandas:

arr = np.hstack((S1.values[: , None], S2.values[: , None]))

Or:

arr = np.vstack((S1, S2)).T

print(arr)[[-0.483415 - 0.961871]
   [-0.514082 - 0.964762]
   [-0.515724 - 0.963798]
   [-0.519375 - 0.962112]
   [-0.505685 - 0.962028]]

Just cast it to a ndarray:

>>> a = [1, 2, 3, 4] >>>
   b = [5, 6, 7, 8] >>>
   c = list(zip(a, b)) >>>
   c[(1, 1), (2, 2), (3, 3), (4, 4)] >>>
   d = np.array(c) >>>
   d
array([
      [1, 5],
      [2, 6],
      [3, 7],
      [4, 8]
   ]) >>>
   d.shape(4, 2)

Try:

numpy.hstack((S1, S2))

Suggestion : 2

merge/zip two series into ndarray of ndarray,Is there a way to match serial numbers from two dataframes and add a list of Series (from rows) from df2 into a new column in df1 (Python, pandas),Pandas/Python Converting a series with two columns into a dataframe,Merge two columns into one within the same data frame in pandas/python

Zip here is not necessary, for better performance use numpy or pandas:

arr = np.hstack((S1.values[: , None], S2.values[: , None]))

Or:

arr = np.vstack((S1, S2)).T

print(arr)[[-0.483415 - 0.961871]
   [-0.514082 - 0.964762]
   [-0.515724 - 0.963798]
   [-0.519375 - 0.962112]
   [-0.505685 - 0.962028]]

Just cast it to a ndarray:

>>> a = [1, 2, 3, 4] >>>
   b = [5, 6, 7, 8] >>>
   c = list(zip(a, b)) >>>
   c[(1, 1), (2, 2), (3, 3), (4, 4)] >>>
   d = np.array(c) >>>
   d
array([
      [1, 5],
      [2, 6],
      [3, 7],
      [4, 8]
   ]) >>>
   d.shape(4, 2)

Try:

numpy.hstack((S1, S2))

Suggestion : 3

I would like to zip them into a numpy anycodings_pandas ndarray of ndarray so that it looks like anycodings_pandas this:,How do I do the same "zip" but get back an anycodings_pandas ndarray of ndarray? I don't want loops.,I have two pandas series of the same length anycodings_pandas like this:,Zip here is not necessary, for better anycodings_numpy-ndarray performance use numpy or pandas:

I have two pandas series of the same length anycodings_pandas like this:

S1 =
   0 - 0.483415
1 - 0.514082
2 - 0.515724
3 - 0.519375
4 - 0.505685
   ...

   S2 =
   1 - 0.961871
2 - 0.964762
3 - 0.963798
4 - 0.962112
5 - 0.962028
   ...

I would like to zip them into a numpy anycodings_pandas ndarray of ndarray so that it looks like anycodings_pandas this:

<class 'numpy.ndarray'>
   [[-0.483415 -0.961871]
   [-0.514082 -0.964762]
   [-0.515724 -0.963798]
   ...
   ]

If I wanted a list of tuple I could say anycodings_pandas this:

v = list(zip(S1, S2))

Zip here is not necessary, for better anycodings_numpy-ndarray performance use numpy or pandas:

arr = np.hstack((S1.values[: , None], S2.values[: , None]))

Or:

arr = np.vstack((S1, S2)).T

print(arr)[[-0.483415 - 0.961871]
   [-0.514082 - 0.964762]
   [-0.515724 - 0.963798]
   [-0.519375 - 0.962112]
   [-0.505685 - 0.962028]]

Just cast it to a ndarray:

>>> a = [1, 2, 3, 4] >>>
   b = [5, 6, 7, 8] >>>
   c = list(zip(a, b)) >>>
   c[(1, 1), (2, 2), (3, 3), (4, 4)] >>>
   d = np.array(c) >>>
   d
array([
      [1, 5],
      [2, 6],
      [3, 7],
      [4, 8]
   ]) >>>
   d.shape(4, 2)

Try:

numpy.hstack((S1, S2))

Suggestion : 4

Split array into a list of multiple sub-arrays of equal size.,Assemble an nd-array from nested lists of blocks.,The axis in the result array along which the input arrays are stacked.,The stacked array has one more dimension than the input arrays.

>>> arrays = [np.random.randn(3, 4) for _ in range(10)] >>>
   np.stack(arrays, axis = 0).shape(10, 3, 4)
>>> np.stack(arrays, axis = 1).shape(3, 10, 4)
>>> np.stack(arrays, axis = 2).shape(3, 4, 10)
>>> a = np.array([1, 2, 3]) >>>
   b = np.array([4, 5, 6]) >>>
   np.stack((a, b))
array([
   [1, 2, 3],
   [4, 5, 6]
])
>>> np.stack((a, b), axis = -1)
array([
   [1, 4],
   [2, 5],
   [3, 6]
])

Suggestion : 5

Paul Hudson    @twostraws    May 28th 2019

The zip() function is designed to merge two sequences into a single sequence of tuples. For example, here is an array of wizards:

let wizards1 = ["Harry", "Ron", "Hermione"]

And here’s a matching array of the animals owned by those wizards:

let animals1 = ["Hedwig", "Scabbers", "Crookshanks"]

Using zip() we can combine them together:

let combined1 = zip(wizards1, animals1)

If you print combined you’ll see it contains this array:

[("Harry", "Hedwig"), ("Ron", "Scabbers"), ("Hermione", "Crookshanks")]

For example, this code will print out the animals belonging to the first three wizards, but nothing for Draco because he doesn’t have a matching animal:

let wizards2 = ["Harry", "Ron", "Hermione", "Draco"]
let animals2 = ["Hedwig", "Scabbers", "Crookshanks"]

for (wizard, animal) in zip(wizards2, animals2) {
   print("\(wizard) has \(animal)")
}

Suggestion : 6

Using pandas.concat() method you can combine/merge two or more series into a DataFrame (create DataFrame from multiple series). Besides this you can also use Series.append(), pandas.merge(), DataFrame.join() to merge multiple Series to create DataFrame.,By using pandas.concat() you can combine pandas objects for example multiple series along a particular axis (column-wise or row-wise) to create a DataFrame.,When you combine two pandas Series into a DataFrame, it creates a DataFrame with the two columns. In this aritcle I will explain different ways to combine two and more Series into a DataFrame.,You can also use DataFrame.join() to join two series. In order to use DataFrame object first you need to have a DataFrame object. One way to get is by creating a DataFrame from Series and use it to combine with another Series.

concat() method takes several params, for our scenario we use list that takes series to combine and axis=1 to specify merge series as columns instead of rows. Note that using axis=0 appends series to rows instead of columns.

import pandas as pd
# Create pandas Series
courses = pd.Series(["Spark", "PySpark", "Hadoop"])
fees = pd.Series([22000, 25000, 23000])
discount = pd.Series([1000, 2300, 1000])

# Combine two series.
df = pd.concat([courses, fees], axis = 1)

# It also supports to combine multiple series.
df = pd.concat([courses, fees, discount], axis = 1)
print(df)
2._
0 1 2
0 Spark 22000 1000
1 PySpark 25000 2300
2 Hadoop 23000 1000

Note that if Series doesn’t contains names and by not proving names to columns while merging, it assigns numbers to columns.

# Create Series by assigning names
courses = pd.Series(["Spark", "PySpark", "Hadoop"], name = 'courses')
fees = pd.Series([22000, 25000, 23000], name = 'fees')
discount = pd.Series([1000, 2300, 1000], name = 'discount')

df = pd.concat([courses, fees, discount], axis = 1)
print(df)
5._
# Assign Index to Series
index_labels = ['r1', 'r2', 'r3']
courses.index = index_labels
fees.index = index_labels
discount.index = index_labels

# Concat Series by Changing Names
df = pd.concat({
   'Courses': courses,
   'Course_Fee': fees,
   'Course_Discount': discount
}, axis = 1)
print(df)

Finally, let’s see how to rest an index using reset_index() method. This moves the current index as a column and adds a new index to a combined DataFrame.

#change the index to a column & create new index
df = df.reset_index()
print(df)

Suggestion : 7

Data alignment and relational data manipulations for merging and joining together heterogeneous data sets,Tools for reading / writing array data to disk and working with memory-mapped files,Fast vectorized array operations for data munging and cleaning, subsetting and filtering, transformation, and any other kinds of computations,For more on file reading and writing, especially tabular or spreadsheet-like data, see the later chapters involving pandas and DataFrame objects.

In[13]: data1 = [6, 7.5, 8, 0, 1]

In[14]: arr1 = np.array(data1)

In[15]: arr1
Out[15]: array([6., 7.5, 8., 0., 1.])
In[27]: arr1 = np.array([1, 2, 3], dtype = np.float64)

In[28]: arr2 = np.array([1, 2, 3], dtype = np.int32)

In[29]: arr1.dtype In[30]: arr2.dtype
Out[29]: dtype('float64') Out[30]: dtype('int32')
In[45]: arr = np.array([
   [1., 2., 3.],
   [4., 5., 6.]
])

In[46]: arr
Out[46]:
   array([
      [1., 2., 3.],
      [4., 5., 6.]
   ])

In[47]: arr * arr In[48]: arr - arr
Out[47]: Out[48]:
   array([
      [1., 4., 9.], array([
         [0., 0., 0.],
         [16., 25., 36.]
      ])[0., 0., 0.]
   ])
In[51]: arr = np.arange(10)

In[52]: arr
Out[52]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In[53]: arr[5]
Out[53]: 5

In[54]: arr[5: 8]
Out[54]: array([5, 6, 7])

In[55]: arr[5: 8] = 12

In[56]: arr
Out[56]: array([0, 1, 2, 3, 4, 12, 12, 12, 8, 9])
In[75]: arr[1: 6]
Out[75]: array([1, 2, 3, 4, 64])
In[83]: names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])

In[84]: data = np.random.randn(7, 4)

In[85]: names
Out[85]:
   array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'],
      dtype = '|S4')

In[86]: data
Out[86]:
   array([
      [-0.048, 0.5433, -0.2349, 1.2792],
      [-0.268, 0.5465, 0.0939, -2.0445],
      [-0.047, -2.026, 0.7719, 0.3103],
      [2.1452, 0.8799, -0.0523, 0.0672],
      [-1.0023, -0.1698, 1.1503, 1.7289],
      [0.1913, 0.4544, 0.4519, 0.5535],
      [0.5994, 0.8174, -0.9297, -1.2564]
   ])

Suggestion : 8

Creating a DataFrame by passing a NumPy array, with a datetime index and labeled columns:,Creating a DataFrame by passing a dictionary of objects that can be converted into a series-like structure:,Setting by assigning with a NumPy array:,Creating a Series by passing a list of values, letting pandas create a default integer index:

In[1]: import numpy as np

In[2]: import pandas as pd
In[3]: s = pd.Series([1, 3, 5, np.nan, 6, 8])

In[4]: s
Out[4]:
   0 1.0
1 3.0
2 5.0
3 NaN
4 6.0
5 8.0
dtype: float64
In[5]: dates = pd.date_range("20130101", periods = 6)

In[6]: dates
Out[6]:
   DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
         '2013-01-05', '2013-01-06'
      ],
      dtype = 'datetime64[ns]', freq = 'D')

In[7]: df = pd.DataFrame(np.random.randn(6, 4), index = dates, columns = list("ABCD"))

In[8]: df
Out[8]:
   A B C D
2013 - 01 - 01 0.469112 - 0.282863 - 1.509059 - 1.135632
2013 - 01 - 02 1.212112 - 0.173215 0.119209 - 1.044236
2013 - 01 - 03 - 0.861849 - 2.104569 - 0.494929 1.071804
2013 - 01 - 04 0.721555 - 0.706771 - 1.039575 0.271860
2013 - 01 - 05 - 0.424972 0.567020 0.276232 - 1.087401
2013 - 01 - 06 - 0.673690 0.113648 - 1.478427 0.524988
In[9]: df2 = pd.DataFrame(
      ...: {
         ...: "A": 1.0,
         ...: "B": pd.Timestamp("20130102"),
         ...: "C": pd.Series(1, index = list(range(4)), dtype = "float32"),
         ...: "D": np.array([3] * 4, dtype = "int32"),
         ...: "E": pd.Categorical(["test", "train", "test", "train"]),
         ...: "F": "foo",
         ...:
      }
      ...: )
   ...:

   In[10]: df2
Out[10]:
   A B C D E F
0 1.0 2013 - 01 - 02 1.0 3 test foo
1 1.0 2013 - 01 - 02 1.0 3 train foo
2 1.0 2013 - 01 - 02 1.0 3 test foo
3 1.0 2013 - 01 - 02 1.0 3 train foo
In[11]: df2.dtypes
Out[11]:
   A float64
B datetime64[ns]
C float32
D int32
E category
F object
dtype: object
In [12]: df2.<TAB> # noqa: E225, E999
   df2.A df2.bool
   df2.abs df2.boxplot
   df2.add df2.C
   df2.add_prefix df2.clip
   df2.add_suffix df2.columns
   df2.align df2.copy
   df2.all df2.count
   df2.any df2.combine
   df2.append df2.D
   df2.apply df2.describe
   df2.applymap df2.diff
   df2.B df2.duplicated