4.3.1 Problem Formulation
In this section, the intuition behind employing Bag-of-Words and skip-gram models for analysis of organismal movement data, specifically geo-spatial trajectories, is discussed. It is intended to create abstract bridge which links concept of contextual word embeddings to geo-spatial key points embeddings in animal trajectories. For this, the collected data, which is composed of sequences of geo-spatial coordinates recorded from the focal organisms, could be thought of as sequences of sound frequencies. A segment of these coordinates constitutes trajectory segment represented by a key point as frequency segments make up spoken words. In language processing, to account for slight variances in sound, the cen-troids or the closest samples to the cencen-troids are chosen. Likewise, geo-spatial cencen-troids
used for key point selection in case of trajectories. As a result, trajectories are composed of series of key points as sentences are of words. Theoretically, it is expected that the ar-rangement of data points in the embedding space would represent the semantical relations between key points.
In previous chapter, the movement models were constructed based on assumption of exchangeability. It was successful to a certain level in modeling of navigation strategies ignoring the temporal order between navigated key points. This approach is applicable in cases where dynamical constraints are relaxed, since conditional independence is assumed between key points of trajectories given navigation capacity, motion capacity, internal state and environment factors. The probability of a set of navigation key points can be written as following.
p(u0,· · · ,uK|Ω,Φ,W,R)= Y
k∈K
p(uk|Ω,Φ,W,R) (4.7)
With that, it is possible to adjust the extent of the temporal window to which, these key points belong. This could range from an hour to entire a trip. The former depends on dynamical properties according to which, the sequential constraints could be loosened. For instance, effects of wind over entire trip could be ignored as animal compensates for such deviations in their trajectories [13]. In this study, this approach is improved by inclusion of contextual information in modeling of trajectories’ key points. This information could be temporal, spatial or from any other semantical domain. It is described as
p(uk|Ω,Φ,wc,rc,uc)∼ p(uk|Λk) (4.8)
where Λk is contextual feature vector for uk. Subscript c identifies the contextual infor-mation for the corresponding variables. Theoretically, over the course of a trajectory with
lengthK, joint probability of key points given their context should be maximized as arg max
Λk
Y
k∈K
p(uk|Λk) (4.9)
The main motivation driving this approach is based on [11] where contemporary envi-ronmental contexts were found to be also influential in navigation strategies of Streaked Shearwaters. For example, for foraging behavior, temporal context which implicates simi-lar environment factors could effectively provide clues about gender of Shearwaters based on set of trajectory key points with their contextual features.
At this point, the remaining part is modeling of p(uk|Λk) and Λk which is addressed later in this section. Prior to that, the key point extraction method is described.
4.3.2 Key point Extraction
As discussed earlier, one of the challenges in movement ecology is sampling frequency and segmentation of trajectories. Quality of the information, which is carried by repre-sentative key points or segments, has substantial role in downstream results. Objective of analysis is also a major factor in selection of the compression or segmentation method. For instance, for analyzing navigation capacity, start and destination of trip segments are gen-erally extracted. In the case of analyzing movement paths, shape of segments is the feature to be preserved. Here, target of analysis is navigation behavior rather than shape of local path segments. Therefore, representation points of trajectory segments are extracted using DBSCAN method. DBSCAN, requires no initial guess for the number of clusters with ability to identify noise. Besides, its relatively higher performance in identifying densities [142] makes it the method of choice here. It is worth noting that, given a fixed sampling rate, destinations of navigation, activity or resting locations, and path intersections could be extracted setting appropriate clustering parameters. These segments represent a trajectory
sequence. Generally, unlike key points for human trajectories like school, cinema, hotel, restaurant, etc. which carry predefined semantical or functional information, key points for animal trajectories, specifically in the case of seabirds are not having fixed locations nor functionally predetermined. An option to approach this problem is to create hierarchical key point clusters. This method identifies spatial super key points that are shared between trajectories and each of the member key points could have a semantical feature assigned to it. For instance, common path ways, foraging spots, or prey patches at sea could be identified using this method.
4.3.3 Model Construction and Optimization
As mentioned earlier, a way to create contextual representation vectors of key points is to maximize (4.9). The probability function p(uk|Λk) can be modelled with a SoftMax function as
p(uk|Λk)= eΛk.vk P
k0∈KeΛk.vk0 (4.10)
whereKis set of all key points in habitat. Λkis a D-dimensional contextual feature vector and defined as
Λk =X
c∈C
vc (4.11)
whereCis size of the context key points set. The loss function can be written as
−X
n∈N
logp(un|Λn) (4.12)
where, N is size of training set. This approach is analogous to continuous Bag-of-Words (CBOW) model in NLP.
In [138], it was suggested that CBOW is faster and suitable for larger data sets. It is while, skip-gram produces better representations for smaller data sets. Therefore, the model
proposed here, we took an approach in line with skip-gram model. Here, the objective is set to generate context key points given a key point. It is composed of a simple feedforward neural network with a single hidden layer. The input layer is one-hot vector of a key point and the output layer is a SoftMax function which estimates probabilities of context points given that key point. Like CBOW, this network projects each discrete key point to a point on D-Dimensional continuous vector space. Then, theD-dimensional representation vector is projected back to aK-dimensional continuous vector. Conceptually, the optimized network should project key points to vicinity of their context key points. Algebraically, this network is described as following equations
ˆ
p(kc|hn)= ekˆc P
k0∈Keˆkk0 (4.13)
kˆc = ΘThn, Θ∈RD×K (4.14)
hn =knTΦ, Φ∈RK×D (4.15)
in which, ˆp(kc|kn) is the estimated SoftMax probability of sample context key point ˆkcgiven the key point kn in data set andhn is its representation vector in the embedding space. A schematic diagram of this model is shown in Figure 4.1.
For training this network, NCE function [141] is selected as loss function. For opti-mization, stochastic gradient descent is used. This combination provides both scalability and reliability to the training process. This setup is optimized as though the output proba-bilities are significantly higher for the context key points rather than the rest. Regarding the selection of context sizeCand the dimension of representation spaceD, they are manually adjusted to achieve the optimum results.
Figure 4.1: Each key point and its semantical features is encoded to a one-hot vector kn
which is then projected on embedding spacehnand projected back to original space. After applying SoftMax, the loss is computed against sampled context key points kc. This is interpreted as the probability of ˆkcappears in the sampled context key points ofkn.