In day-to-day data analysis, data is often analyzed by date. For example, we need to analyze the user's preference for products in different time periods through the user's order time; For example, the access time of the access log is used to analyze the access cycle and load of the system to provide a basis for resource allocation in different time periods. For example, the user's behavior characteristics and working hours and types of work are analyzed through the user's short brush time; It can be seen that in data analysis, there are still many cases of analysis based on date (time).
At the same time, depending on how the data is recorded, the format in which the time is stored is also different. For example, some system records are recorded in the format of 2024 2 1; Some systems are recorded in the format of 2024-02-01 09:31:52; Some will directly save a timestamp (1698724307189) to record; There are many ways in which time can be stored, so we won't give examples here.
If you want to fully achieve compatibility for different formats, you often need to write a lot of trivial **. And that's just the simplest extraction year, month, and day. Others, such as the addition and subtraction of time, are not simple to complete. As the most powerful toolset for data analysis, pandas naturally also provides a very powerful set of tools for processing time data, so let's take a look at it together.
timestamp: represents a certain point in time, such as the time when the user places an order, or the time when the system is accessed. DateTimeIndex: A sequence representing a point in time, which is equivalent to a list composed of multiple timestamps. timedelta: a single duration. For example, 3 minutes, 5 minutes, etc. are all considered durations, and the duration has different units. The common ones are days, hours, minutes, etc. TimeDeltaIndex: A series of multiple durations, similar to the relationship between DateTimeIndex and Timestamp. dataoffset: the offset of the time in the calendar dimension. For example, if the time is February 1, 2024, and the calendar is offset by one day, it will be January 31, 2024. Dataoffset provides a variety of ways to offset, such as by weekday, so that a shift from 10 a.m. on Friday to the next weekday is 10 a.m. on Monday. When doing time processing in pandas, the following scenarios are used the most:
1. Convert time data in various formats to timestamp type.
2. Use the timestamp type to get various properties of time. Such as date, time, day of the week, etc.
3. Use timestamp and timedelta to calculate time, such as addition and subtraction. For example, add 3 minutes to the current time. If you need to offset the calendar dimension, you need to use it with offset.
4. Filter the records in the dataframe by time. You need to set the time column to datetimeindex, and then filter by time as you would with a normal index.In Python, parsing time data is essentially converting the data to the timestamp type of pandas first, because only after the conversion can subsequent operations be performed.
pandas provides a method to datetime to convert different types of time data to timestamp types.
Strings are a common time storage format, and the to datetime function supports almost all major notations, such as.
import pandas as pd
A common way to represent date + time.
pd_time = pd.to_datetime("2023-08-29 17:17:22")
print(type(pd_time),pd_time)
The time is abbreviated and expressed in a 12-hour format.
pd_time1 = pd.to_datetime("2023-08-29 5:17pm")
print(type(pd_time1), pd_time1)
Representation.
pd_time2 = pd.to_datetime("08/29/2023")
print(type(pd_time2), pd_time2)
Combined with the representation of the month in English.
pd_time3 = pd.to_datetime("aug 29, 2023")
print(type(pd_time3), pd_time3)
Post-execution output:
As you can see from the output above, the To DateTime function returns a timestamp type. If it is a Chinese environment, a format like "February 1, 2024" can also be parsed, and we can parse it through the custom format string to datetime. For example, the following **:
Parse arbitrary time strings using custom format strings.
pd_time4 = pd.to_datetime("February 1, 2024", format="%ymmmm%dday")
print(type(pd_time4), pd_time4)
The output is as follows:
For example, the data source storage we talked about above is timestamped, so our conversion method is as follows:
time_value = 1620565604
Convert the numeric timestamp to the timestamp type and specify the unit in seconds.
pd_time5 = pd.to_datetime(time_value, unit="s")
print(type(pd_time5), pd_time5)
The output is as follows:
A bit of a note here: if we need to join a time zone, we can use tz localize("asia/shanghai") to specify.
Create a timestamp object by specifying information such as the year, month, and day separately.
pd_time7 = pd.timestamp(year=2024, month=2, day=1, hour=21)
print(type(pd_time7),pd_time7)
Get the current time.
pd_time8 = pd.timestamp("now")
print(type(pd_time8),pd_time8)
The output is as follows:
When we get the timestamp object, we can use the methods provided by the timestamp object to obtain various time attributes, and the common methods for obtaining the properties are as follows:
print("Current Time Object:", pd_time8)
print("Day of the week, Monday is 0:", pd_time8.dayofweek)
print("The day of the week, the string indicates:", pd_time8.day_name())
print("Days of the year:", pd_time8.dayofyear)
print("There are a few days of the month:",pd_time8.daysinmonth)
print("Whether this year is a leap year or not", pd_time8.is_leap_year)
print("Whether the current date is the last day of the month", pd_time8.is_month_end)
print("Whether the current date is the first day of the month", pd_time8.is_month_start)
print("Whether the current date is the last day of the quarter", pd_time8.is_quarter_end)
print("Whether the current date is the first day of the season", pd_time8.is_quarter_start)
print("Whether the current date is the last day of the current year", pd_time8.is_year_end)
print("Whether the current date is the first day of the current year", pd_time8.is_year_start)
print("Current quarters:", pd_time8.quarter)
print("Current time zone:", pd_time8.tz)
print("Weeks of the year:", pd_time8.week)
print("Year:", pd_time8.year)
print("Month:", pd_time8.month)
print("Day:",pd_time8.day)
print("Hours:", pd_time8.hour)
print("minutes:", pd_time8.minute)
print("seconds:", pd_time8.second)
The output is as follows:
In simple terms, time is calculated as how much time is in minutes, hours, or seconds after the current time. Because the data calculation of time is relatively special, the number of days and minutes involved in the month need to be divided by 60, and the hours are converted into minutes, which also need to be divided by 60. Then we can use pandas to calculate the time. The time calculation of pandas is implemented by a mixture of timestamp objects and timedelta objects.
from strings.
delta1 = pd.timedelta('0.5 days')
print("Half a day:", delta1)
delta2 = pd.timedelta("2 days 3 hour 20 minutes")
print("2 days, 3 hours and 20 minutes", delta2)
delta3 = pd.timedelta("1 days 20:36:00")
print("1 day, 8 hours and 36 minutes:", delta3)
Created from unit time.
delta4 = pd.timedelta(days = 1.5)
print("1 and a half days:", delta4)
delta5 = pd.timedelta(days = 10, hours= 9)
print("Ten days and nine hours:", delta5)
Created from time abbreviations.w: stands for week, week.
d: represents the day.
h: represents hours.
m: represents minutes.
s: stands for seconds.
delta6 = pd.timedelta("2w3d")
print("Two weeks and three days:", delta6)
delta7 = pd.timedelta("6h30m12s")
print("6 hours, 30 minutes, 12 seconds:", delta7)
Get the current time.
current_time = pd.timestamp("now")
print("Current Time:", current_time)
Get the current time minus two weeks' time.
two_week_ago = current_time - pd.timedelta("2w")
print("Two weeks ago:", two_week_ago)
Gets the current time 30 days and 7 hours after the time.
future_time = current_time + pd.timedelta("30d7h")
print("Time after 30 days and 7 hours:",future_time)
The output after execution is as follows:
In addition to calculating timedelta and timestamp, two timestamps can also be subtracted to get a duration:
Created last year's National Day at eight o'clock in the morning.
national_day = pd.to_datetime("2023-10-01 08:00:00")
Calculate the timedelta of the current time and the national day time
delta8 = current_time - national_day
print("It's been past last year's National Day:", delta8)
The output is:
Let's take a look at how we use our time to query the data in a dataframe.
You can prepare the sample data by yourself, you can directly create a new csv file to do the simulation data, and if you don't want to do it, you can leave a message!After loading the sample data into the dataframe, you need to set the time field to datetimeindex. The conversion is in two steps:
Step 1: Convert a time column to a timestamp object.
Convert the time column to a timestamp object.
df_log["time"] = pd.to_datetime(df_log["time"])
Look at the time column.
df_log["time"]
After execution, output:
name: time, length: 1000, dtype: datetime64[ns]
Step 2: Set the new time column to an index.
Set the time column to index the df log.
df_log.set_index("time", inplace=true)
Check out the latest dataframe
df_log
Post-execution output:
As you can see from the figure, the time column has replaced the previous sequence number and has become the index of the dataframe.
After the above steps are completed, we can filter the data in the dataframe by time.
Query data from September 1 to September 15.
df_log.loc["2018-09-01" : "2018-09-15",:]
Select data from August to September.
df_log.loc["2018-08" : "2018-09", :
Select data from August 1 to September 2 before 2 p.m.
df_log.loc["2018-08-01" : "2018-09-02 14:00:00", :
After executing the above **, we can see the data corresponding to the dataframe filtered by time.
Welcome to the *** Server Technology Selection 2024 Answer Book