equivalent of python's list sort with key / schwartzian transform

  • Last Update :
  • Techknowledgy :

You could just roll your own:

template <typename RandomIt, typename KeyFunc>
void sort_by_key(RandomIt first, RandomIt last, KeyFunc func) 
{
    using Value = decltype(*first);
    std::sort(first, last, [=](const ValueType& a, const ValueType& b) {
        return func(a) < func(b);
    });
}

We can even hack together a class that will allow us to still use std::sort:

template <typename RandomIter, typename KeyFunc>
void sort_by_key(RandomIter first, RandomIter last, KeyFunc func)
{
    using KeyT = decltype(func(*first));
    using ValueT = typename std::remove_reference<decltype(*first)>::type;

    struct Pair {
        KeyT key;
        RandomIter iter;
        boost::optional<ValueT> value;

        Pair(const KeyT& key, const RandomIter& iter)
            : key(key), iter(iter)
        { }

        Pair(Pair&& rhs)
            : key(std::move(rhs.key))
            , iter(rhs.iter)
            , value(std::move(*(rhs.iter)))
        { }

        Pair& operator=(Pair&& rhs) {
            key = std::move(rhs.key);
            *iter = std::move(rhs.value ? *rhs.value : *rhs.iter);
            value = boost::none;
            return *this;
        }

        bool operator<(const Pair& rhs) const {
            return key < rhs.key;
        }
    };

    std::vector<Pair> ordering;
    ordering.reserve(last - first);

    for (; first != last; ++first) {
        ordering.emplace_back(func(*first), first);
    }

    std::sort(ordering.begin(), ordering.end());
}

Or, if that's too hacky, here's my original solution, which requires us to write our own sort

template <typename RandomIt, typename KeyFunc>
void sort_by_key_2(RandomIt first, RandomIt last, KeyFunc func)
{
    using KeyT = decltype(func(*first));
    std::vector<std::pair<KeyT, RandomIt> > ordering;
    ordering.reserve(last - first);

    for (; first != last; ++first) {
        ordering.emplace_back(func(*first), first);
    }

    // now sort this vector by the ordering - we're going
    // to sort ordering, but each swap has to do iter_swap too
    quicksort_with_benefits(ordering, 0, ordering.size());
}

Which, given a simple example:

int main()
{
    std::vector<int> v = {-2, 10, 4, 12, -1, -25};

    std::sort(v.begin(), v.end());
    print(v); // -25 -2 -1 4 10 12

    sort_by_key_2(v.begin(), v.end(), [](int i) { return i*i; }); 
    print(v); // -1 -2 4 10 12 -25
}

If the key type is not terribly huge (if it is, measure I'd say), you can just save an

std::vector< std::pair<key_type, value_type>> vec;

Suggestion : 2

In Python, given a list, I can sort it by a key function, e.g.:,Equivalent Of Pythons List Sort With Key Schwartzian Transform, 1 week ago The key argument is new in Python 2.4, for older versions this kind of sorting is quite simple to do with list comprehensions. To sort a list of strings by their uppercase values: tmp1 = [(x.upper(), x) for x in L] # Schwartzian transform tmp1.sort() Usorted = [x[1] for x in tmp1] To sort by the integer value of a subfield extending from ... ,Is there an equivalent in C++? std::sort() only allows me to provide a custom comparator (equivalent of Python's items.sort(cmp=...)), not a key function. If not, is there any well-tested, efficient, publicly available implementation of the equivalent I can drop into my code?


>>> def get_value(k): ...print "heavy computation for", k...
   return {
      "a": 100,
      "b": 30,
      "c": 50,
      "d": 0
   } [k]... >>> items = ['a', 'b', 'c', 'd'] >>> items.sort(key = get_value) heavy computation
for a heavy computation
for b heavy computation
for c heavy computation
for d >>> items['d', 'b', 'c', 'a']

template <typename RandomIt, typename KeyFunc> void sort_by_key(RandomIt first, RandomIt last, KeyFunc func)  {     using Value = decltype(*first);     std::sort(first, last, [=](const ValueType& a, const ValueType& b) {         return func(a) < func(b);     }); } 
>>> def get_value(k): ...print "heavy computation for", k...
   return {
      "a": 100,
      "b": 30,
      "c": 50,
      "d": 0
   } [k]... >>> items = ['a', 'b', 'c', 'd'] >>> items.sort(key = get_value) heavy computation
for a heavy computation
for b heavy computation
for c heavy computation
for d >>> items['d', 'b', 'c', 'a']
template <typename RandomIt, typename KeyFunc>void sort_by_key(RandomIt first, RandomIt last, KeyFunc func)  {using Value = decltype(*first);std::sort(first, last, [=](const ValueType&a, const ValueType&b) {  return func(a) <func(b);});} 
template <typename RandomIter, typename KeyFunc>void sort_by_key(RandomIter first, RandomIter last, KeyFunc func) {using KeyT = decltype(func(*first));using ValueT = typename std::remove_reference<decltype(*first)>::type;struct Pair {  KeyT key;  RandomIter iter;  boost::optional<ValueT>value;   Pair(const KeyT&key, const RandomIter&iter) : key(key), iter(iter)  { }   Pair(Pair&&rhs) : key(std::move(rhs.key)) , iter(rhs.iter) , value(std::move(*(rhs.iter)))  { }   Pair&operator=(Pair&&rhs) { key = std::move(rhs.key);*iter = std::move(rhs.value ? *rhs.value : *rhs.iter);value = boost::none;return *this;  }   bool operator<(const Pair&rhs) const { return key <rhs.key;  }};std::vector<Pair>ordering;ordering.reserve(last - first);for (;first != last;++first) {  ordering.emplace_back(func(*first), first);}std::sort(ordering.begin(), ordering.end());} 
template <typename RandomIt, typename KeyFunc>void sort_by_key_2(RandomIt first, RandomIt last, KeyFunc func) {using KeyT = decltype(func(*first));std::vector<std::pair<KeyT, RandomIt>>ordering;ordering.reserve(last - first);for (;first != last;++first) {  ordering.emplace_back(func(*first), first);}// now sort this vector by the ordering - we're going// to sort ordering, but each swap has to do iter_swap tooquicksort_with_benefits(ordering, 0, ordering.size());} 

Suggestion : 3

Both list.sort() and sorted() have a key parameter to specify a function (or other callable) to be called on each list element prior to making comparisons.,Another name for this idiom is Schwartzian transform, after Randal L. Schwartz, who popularized it among Perl programmers.,Many constructs given in this HOWTO assume Python 2.4 or later. Before that, there was no sorted() builtin and list.sort() took no keyword arguments. Instead, all of the Py2.x versions supported a cmp parameter to handle user specified comparison functions.,Another difference is that the list.sort() method is only defined for lists. In contrast, the sorted() function accepts any iterable.

>>> sorted([5, 2, 3, 1, 4])[1, 2, 3, 4, 5]
>>> a = [5, 2, 3, 1, 4] >>>
   a.sort() >>>
   a[1, 2, 3, 4, 5]
>>> sorted({
   1: 'D',
   2: 'B',
   3: 'B',
   4: 'E',
   5: 'A'
})[1, 2, 3, 4, 5]
>>> sorted("This is a test string from Andrew".split(), key = str.lower)['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
>>> student_tuples = [
      ...('john', 'A', 15),
      ...('jane', 'B', 12),
      ...('dave', 'B', 10),
      ...
   ] >>>
   sorted(student_tuples, key = lambda student: student[2]) # sort by age[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
>>> class Student:
   ...def __init__(self, name, grade, age):
   ...self.name = name
   ...self.grade = grade
   ...self.age = age
   ...def __repr__(self):
   ...
   return repr((self.name, self.grade, self.age))

      >>>
      student_objects = [
         ...Student('john', 'A', 15),
         ...Student('jane', 'B', 12),
         ...Student('dave', 'B', 10),
         ...
      ] >>>
      sorted(student_objects, key = lambda student: student.age) # sort by age[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]

Suggestion : 4

Python programmers use the transform in sorts where the comparison operation may be expensive. ,In this example, f(x) returns a key which is suitable for using for sorting. For instance, it might return the length of x, or it might do a database lookup based on the value of x. ,This is the idiomatic Haskell way of comparing by a transformed version of the elements: ,However, this is not the full Schwartzian Transform, because the comparison of the memoized keys is fixed. (Though there is a trick for getting around that.)

@sorted_array =
   map {
      $_ - > [0]
   }
# extract original list elements
sort {
   $a - > [1] <= > $b - > [1]
}
# sort list by keys
map {
   [$_, -M $_]
}
# pair up list elements with keys
@files_array;
my @sorted_files =
   map $_ - > [0], # extract original name
sort {
   $a - > [1] <= > $b - > [1] # sort first numerically by size(smallest first)
   or $b - > [2] <= > $a - > [2] # then numerically descending by modtime age(oldest first)
   or $a - > [0] cmp $b - > [0] # then stringwise by original name
}
map[$_, -s $_, -M $_], # compute tuples of name, size, modtime
glob "*";
# of all files in the directory
# python2 .4
new_list = sorted(old_list, key = f)
# python2 .4;
removed in python3 .0
new_list = sorted(old_list, key = f, cmp = lambda a, b: cmp(a[0], b[0]) or cmp(b[1], a[1]))
# python2 .3
new_list = [(f(x), x) for x in old_list]
new_list.sort()
new_list = [x[1]
   for x in new_list
]
new_list = map(lambda x: x[1],
   sorted(
      map(lambda x: (f(x), x),
         old_list)))

Suggestion : 5

A Schwartzian transform involves the functional idiom described above, which does not use temporary arrays. ,The Schwartzian transform is a version of a Lisp idiom known as decorate-sort-undecorate, which avoids recomputing the sort keys by temporarily associating them with the input items. This approach is similar to memoization, which avoids repeating the calculation of the key corresponding to a specific input value. By comparison, this idiom assures that each input item's key is calculated exactly once, which may still result in repeating some calculations if the input data contains duplicate items. ,In D 2 and above, the schwartz Sort function is available. It might require less temporary data and be faster than the Perl idiom or the decorate–sort–undecorate idiom present in Python and Lisp. This is because sorting is done in-place, and only minimal extra data (one array of transformed elements) is created.,^ Martelli, Alex; Ascher, David, eds. (2002). "2.3 Sorting While Guaranteeing Sort Stability". Python Cookbook. O'Reilly & Associates. p. 43. ISBN 0-596-00167-3. This idiom is also known as the 'Schwartzian transform', by analogy with a related Perl idiom.

@sorted = map {
   $_ - > [0]
}
sort {
   $a - > [1] <= > $b - > [1] or $a - > [0] cmp $b - > [0]
}
# Use numeric comparison, fall back to string sort on original
map {
   [$_, length($_)]
}
# Calculate the length of the string
@unsorted;
@sorted = map {
   $_ - > [0]
}
sort {
   $a - > [1] cmp $b - > [1] or $a - > [0] cmp $b - > [0]
}
map {
   [$_, foo($_)]
}
@unsorted;
@sorted = sort {
   foo($a) cmp foo($b)
}
@unsorted;
1._
@sorted = map {
   $_ - > [0]
}
sort {
   $a - > [1] <= > $b - > [1] or $a - > [0] cmp $b - > [0]
}
# Use numeric comparison, fall back to string sort on original
map {
   [$_, length($_)]
}
# Calculate the length of the string
@unsorted;
2._
@sorted = map {
   $_ - > [0]
}
sort {
   $a - > [1] cmp $b - > [1] or $a - > [0] cmp $b - > [0]
}
map {
   [$_, foo($_)]
}
@unsorted;
3._
@sorted = sort {
   foo($a) cmp foo($b)
}
@unsorted;

The same algorithm can be written procedurally to better illustrate how it works, but this requires using temporary arrays, and is not a Schwartzian transform. The following example pseudo-code implements the algorithm in this way:

 for each file in filesArray
 insert array(file, modificationTime(file)) at end of transformedArray

 function simpleCompare(array a, array b) {
    return a[2] < b[2]
 }

 transformedArray: = sort(transformedArray, simpleCompare)

 for each file in transformedArray
 insert file[1] at end of sortedArray

The first known online appearance of the Schwartzian transform is a December 16, 1994 posting by Randal Schwartz to a thread in comp.unix.shell Usenet newsgroup, crossposted to comp.lang.perl. (The current version of the Perl Timeline is incorrect and refers to a later date in 1995.) The thread began with a question about how to sort a list of lines by their "last" word:

adjn: Joshua Ng
adktk: KaLap Timothy Kwong
admg: Mahalingam Gobieramanan
admln: Martha L.Nangalama
#!/usr/bin/perl

require 5;
# New features, new bugs!
   print
map {
   $_ - > [0]
}
sort {
   $a - > [1] cmp $b - > [1]
}
map {
   [$_, /(\S+)$/]
} <
> ;
function spaceballs_sort(array & $a): void {
   array_walk($a, function( & $v, $k) {
      $v = array($v, $k);
   });
   asort($a);
   array_walk($a, function( & $v, $_) {
      $v = $v[0];
   });
}