4.4 Experiments, Results and Discussion
4.4.1 Trajectory Data Exploration
Trajectories are sequences of spatial data points. The length of these sequences can vary significantly while their spatial footprints only slightly change. Therefore, compar-ing trajectories with each other was not always a straightforward task to perform. There are methods like dynamic time warping, sampling, etc. available to deal with the existing challenges, but, most of these methods do not create an efficient metric numerical represen-tation for the components of trajectories. Here we show how these metric represenrepresen-tations would help researchers to explore and analyze relationships between trajectories.
The first stage involves extracting key points from trajectories. Since, we are interested in navigation trends, beginning and destination of travel segments in trajectories are ideal key points. Assuming that these end points are clusters of points with low or stationary speeds, only points with speeds lower than 2 m/s were considered for clustering. As for the DBSCAN, the neighborhood radius and the minimum number of neighbors set to 1.5 km and 10 respectively. It resulted in 667 clusters. Out of these, 500 were selected as key points based on their ubiquity and frequency in trajectories. Then one-hot encoded vectors of these key points were chosen as input to skip-gram model. Initially, the embedding vector size was set to 128, number of negative samples for the NCE was set to 8 and the context window length was set to 10 in both past and future direction (later in this section, we would discuss role of parameters like embedding vector size in the obtained results).
Then the network was trained in batches of 32 until average loss did not change significantly which was at about 106steps.
To visualize the resulted vector embeddings, dimensionality reduction methods PCA [144] and t-SNE [145] were used to create 2d visualizations. Results are shown in Fig-ure 4.2. It is observable that the embedding vectors are pulled closed to each other in some
(a) (b)
−100 −50 0 50
1st Component
−75
−50
−25 0 25 50 75
2nd Component
(c)
1 0 2
3 5 4
6 7 8 9 10 11
−0.25 0.00 0.25 0.50 1st Principal Component
−0.4
−0.2 0.0 0.2 0.4
2nd Principal Component
(d)
Figure 4.2: Trajectories, selected key points and visualizations of their embedding. (a) Tra-jectories in geo-spatial space. The colony is designated by^. (b) Extracted key points in geo-spatial space. Marker size conveys information about the count of individual trajecto-ries sharing the key point. (c) 2d visualization of key point embeddings using t-SNE. (d) Identified densities of key point embeddings using the first 2 principal components. Den-sities with minimum of 5 members within the distance of 0.03 are highlighted. Centroids are designated by4.
regions while some points are positioned farther from the rest. To analyze the underlying structure, vectors sampled from local neighborhoods in embedding space and are projected back to geo-spatial space. For instance, the designated key points associated with clusters 11, 5, 10 and 3 in Figure 4.2(d) were projected back to geo-spatial space and plotted along with their corresponding trajectories in Figure 4.3. It is apparent that they are not situated at the same relative distances to each other both intra-cluster and inter-cluster wise. For example, key points in clusters 11 and 5 share major geo-spatial bounding regions while in the embedding space, they have no shared bounding regions. These points are not from the immediate neighborhood in embedding space, but geo-spatially they are close. In fact, even though these points seem to be geo-spatially close, they belong to trajectories with different geometries. It is worth noting that there are distances in the embedding space that differ from of those in the corresponding geo-spatial space. The embedding space in this specific instance is optimized to represent key points’ sequential patterns and semantics.
Therefore, key points which are traversed consequently may not be in a close neighbor-hood in geo-spatial space while in the respective embedding space, they appear to be closer to each other. Analogous to text mining, for example, the optimized semantical embedding space pulls “king” closer to “father” rather than “wing”, even though king and wing have closer distance in character-based measurements. Furthermore, regarding the embedding space created based on sequential semantics, it is noted that repetitive or cyclic trajectory segments produce concentrated densities in both geo-spatial and the embedding space as shown in Figure 4.4(c) and Figure 4.4(d). This is due to the fact that subsequent key points are located in close neighborhood geo-spatial space.
In Figure 4.4, the corresponding trajectories with key points in clusters 8, 9, 2 and 4 are illustrated. It is also apparent that clusters 8, and 9 have visually closer trajectories than 2,
38.46°N
139.24°E
200km
(a)
38.46°N
139.24°E
200km
(b)
38.46°N
139.24°E
200km
(c)
38.46°N
139.24°E
200km
(d)
Figure 4.3: Corresponding trajectories of the key point clusters in Figure 4.2(d). key points are identified byI. The colony is designated by^. (a) Cluster 11 (b) Cluster 5 (c) Cluster 10 (d) Cluster 3
and 4. It is while cluster 8 could be imagined as a transition point between 9 and 4. The same could be applied to 8 between 2 and 4. This underlying structure could be very useful in identifying navigational behaviors between species or discovering relationship between trajectories. For instance, considering clusters 4, 8, 9 and 2 the proportions of connected trajectories being of male gender are about 95%, 89%, 77% and 52% respectively. Though, it should be noted that based on the construction of this data set, there is an unlikely chance of these trajectories belong to an individual bird.
These could very well be advantageous in gender classification of trajectories. As mentioned previously, environment is also an influential factor in generation of navigation strategies. As a result, the clusters of key points in the embedding space could be attributed to a certain weather condition or even habitat features like rivers, coasts, etc. However, due to the absence of calendar information for this data set, it was not possible to test this case for contemporary weather conditions. Up to this point, only sequential semantics in spatial domain were used in construction of embedding vectors. In the next section, utilization of different contextual information in other domains like time and activity is discussed. But, before proceeding to the next experiment, it is worth to discuss tuning of the key model pa-rameters like the representation space’s dimension and the context window size, and their effects on the captured information. In regards to the dimension of embedding vectors, certainly, information capacity of embedding vectors is directly related to their dimension-ality, or dimension of the representation space. To examine this, the size of hidden layer and context window in the embedding network was modified individually, and their corre-sponding training results were compared. Part of these results are shown in Figure 4.5. In Figure 4.5(a) and Figure 4.5(b) moving average of the validation error for the last 5e5 steps and the mean and standard deviation of the last 1e5 steps in training of different model
con-38.46°N
139.24°E
200km
(a)
38.46°N
139.24°E
200km
(b)
38.46°N
139.24°E
200km
(c)
38.46°N
139.24°E
200km
(d)
Figure 4.4: Corresponding trajectories of the key point clusters in Figure 4.2(d). key points are identified byI. The colony is designated by^. (a) Cluster 8 (b) Cluster 9 (c) Cluster 2 (d) Cluster 4
5e+05 6e+05
7e+05 8e+05
9e+05 1e+06
Step
1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50
Mean Val. Err.
H16C1 H16C4 H16C8 H16C16 H32C1 H32C4
H32C8 H32C16 H128C1 H128C4 H128C8 H128C16
(a)
H16C1 H16C4
H16C8 H16C16
H32C1 H32C4
H32C8 H32C16
H128C1 H128C4
H128C8 H128C16 0.0
0.5 1.0 1.5 2.0 2.5 3.0
Mean Val. Err.
(b)
Figure 4.5: Average validation errors for training different model configurations. The num-bers proceeding the letters H and C represent the dimensionality of the embedding space and the context window size. The number of negative samples and skip samples are set to 4 and 8 respectively and context window spanned bidirectionally. (a) Average of valida-tion errors for different models in last 5e5 steps of training. It is apparent that 1e6 steps is sufficient for the training as no further improvement is noticed. (b) Average and standard deviation of the validation error for the last 1e5 steps of training. This demonstrates that larger context window size requires greater size of embedding vector while it increases the standard deviation.
figurations are shown respectively. The trend of error curves in Figure 4.5(a) shows that there was no significant improvement expected beyond 1e6 training steps. In Figure 4.5(b), it is seen the that networks with smaller hidden layer dimension have higher NCE loss with larger size of context window, while, posted less standard deviation.
A large context window size attempts to map points farther down in the sequence onto close vicinity of the sampled point. These points may not be geo-spatially close to the sampled point. Therefore, if the context window size is set to 1 the embedding may seem evenly distributed. This is due to the fact that few points in trajectories share the same immediate neighbor key points. On the other hand, if the size of context window is
in-creased, the possibility of sharing contextual key points becomes higher which results in emergence of larger clusters in the embedding space. This effect could be seen using 2d t-SNE embeddings of the trained representation vectors with different context window sizes in Figure 4.6. The context window size is set to 1 for the embeddings shown in Figure 4.6(a) and 8 for the ones shown in Figure 4.6(b). It is apparent that when the window size is small, representation vectors are spread evenly in small concentrations in contrast to the greater context window size with larger pronounced concentrations. In the end, the best choices for the size of hidden layer and context window are dependent on the application, where there are trade-offs to be made.