When Gong analyzes calls, it uses multiple methods to determine who was on the call and when they spoke.
The information is used to calculate stats, and is shown visually on the call page to help you quickly navigate to the relevant part of the discussion.

The first step in speaker identification is dividing the call into segments, each associated with a single speaker. Gong does this in one of two ways:
-
Conference calls: When Gong records a web (or video) conference, we look up the participant list during the call to get a rough estimate of who is present, and when each participant speaks. Conferencing systems tend to exhibit large delays in presenting speaker switches, so the information we get from them regarding when a participant speaks is often inaccurate. To address this issue, we apply a proprietary refined speaker separation algorithm that identifies smaller speech segments (for example, "Yes", "OK"), to attribute the speakers better, even when the conferencing system itself did not present a speaker switch or presented a switch with a delay.
-
Telephony calls: When Gong receives stereo recordings, we use the two channels to determine the speakers. Assuming that these are the two speakers, we do not attempt to divide the call further.
When Gong receives mono recordings, we separate the single audio channel into as many channels as there are speakers, according to voice variance in a process known as diarization.
Gong applies different methods of participant identification, according to the type of call.
In mono call recordings, we only store voice identification for Gong users that have given their consent. Voice identification is not stored for any other call participants.
Here is one way that we identify Gong users on mono call recordings:
-
Gong collects up to 5 short recordings of each recorded team member who has given their consent from calls they participated in. For best results, we look for calls that:
-
Are mono telephony calls
-
Include at least 2 minutes of recorded speech
-
-
Usually, Gong can accurately identify individuals from their second recorded call, based on the sample collected during their first call.
-
Gong replaces these samples on an ongoing basis in order to keep the sample fresh, and to increase recognition accuracy. This helps us identify the Gong user in variable conditions, like when they start the call from a different environment, use a different telephony system, or use a new headset.
-
As soon as we have enough samples for an individual, we revisit earlier calls where recorded team members were not identified, and leverage the sample to rerun voice identification.
For info on how to enable voice identification, see this