What's the fastest way to create and sort timestamped material using Python?
Lets say I will have two arrays.The first line will specify the timestamp and the second line will be the data.
timeStamp=['0001','0002','0003',...,' 9999']
data=[6234,2372,1251,...,5172]
What's the best way to save them? Suppose I want to sort the data from smallest to largest number, while keeping their timestamp values attached to them?
uj5u.com enthusiastic netizens replied:
There are multiple ways to do this.Let's get the following information-
timeStamp=[9,1,2,3,9999]
data=[1245, 6234,2372,1251,5172]
Use base python and zip
The default way of handling data, especially lists.The zip
method allows you to zip two elements by element or multiple lists, create a list of tuples.thensorted
, you can use lamda function formula, which sorts the combined list by the specific position of the element.
l=zip(timeStamp, data) #storing 2 arrays by attaching them elementwise
print(sorted(l, key=lambda x: x[0]))
[(1, 6234), (2, 2372), (>3, 1251), (9, 1245), (9999, 5172)]
using numpy and argsort
Numpy allows you to use multidimensional arrays.For 2 strings, you can simply np.stack
Together they create a 2D array.
For sorting, you can argsort()
Used on the first column (timestamp), it returns Ordered sorted index.You can then use these indices to index the original 2D array to get the sorted order of the array by timestamp.
arr=np.stack([timeStamp, data])
arr[:,arr[0].argsort()]
array([[ 1, 2, 3, 9, 9999 ],
[6234, 2372, 1251, 1245, 5172]])
Use pandas datafames and sort_values
Finally, the best way to handle multiple columns at the same time is to treat them as columns in a DataFrame.Pandas provides a convenient framework for dealing with column/row arranged data, which is very useful in this case because you can also use column names to identify per array/column.
The sort_values
lets you quickly sort complete data based on column names.
import pandas as pd
df=pd.DataFrame(zip(timeStamp, data), columns=['timeStamp','data'])
print(df.sort_values('timeStamp'))
timeStamp data
1 1 6234
2 2 2372
3 3 1251
0 9 1245
4 9999 5172
uj5u.com enthusiastic netizens replied:
You can use a 2D array.You can use
timestamp_data=[ [timeStamp[i], data[i]] for i in range(len (timeStamp)) ]
You can now use
sorted_timestamp_data=sorted(timestamp_data, key=lambda row: row[1])
uj5u.com enthusiastic netizens replied:
Dictionaries are very useful for you.You can zip
data
thentimeStamp
sort,data
then convert the tuple to dict
(dictionary preserves insertion order).Then you will have a data-timestamp pair, where the data is the key and the timestamp is the value.
out=dict(sorted(zip(data, timeStamp)))
Output:
{1251: '0003', 2372: '0002', 5172: '9999', 6234: '0001'}
If you want 2 separate strings, you can do the following.Do not cast to a dict construct, but unpack to a list:
data[:], timeStamp[:]=zip(*sorted(zip(data,timeStamp)))
Output:
[1251, 2372, 5172, 6234], ['0003 ', '0002', '9999', '0001']
uj5u.com enthusiastic netizens replied:
depends on how you want to use it.If you don't want to use another library, I would use something like this
result=sorted(({"timestamp": ts, "data": data} for ts, data in zip(timeStamp, data)), key=lambda d:d["data"]
This is basically a list of dictionaries sorted by data.I would choose a dictionary list because it is more expressive than a tuple list.
uj5u.com enthusiastic netizens replied:
To organize your data the way you describe, you can simply do:
sorted(zip(timeStamp, data), key=lambda x: x[1])
or
from operator import itemgetter
sorted(zip(timeStamp, data), key=itemgetter(1))
To save this object, you can pickle
, herethere is a Nice description.Obviously, there are many options to save it.
uj5u.com enthusiastic netizens replied:
Well, it's easy
records=list(zip(data, timeStamp))
Order:
records.sort()
In Python, tuples are compared element by element from left to right, so in this case The key function is not required below.That's it.As in some comments, there is no need to make it overly complicated.
0 Comments