Get Started
Examples
Concepts
Resources
Projects
Integrations
copy markdown
Diarize multiple speakers on long and short audio files with multilingual support.
OpenAI SDK
Vercel AI SDK
LangChain SDK
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const DiarizationSchema = z.object({
full_text: z.string(),
chunks: z.array(
z.object({
speaker_id: z.string(),
text: z.string(),
start_time: z.number(),
end_time: z.number(),
})
),
number_of_speakers: z.number(),
});
const response = await interfaze.chat.completions.create({
model: "interfaze-beta",
messages: [
{
role: "user",
content: [
{ type: "text", text: "Transcribe and identify the speakers in the audio file" },
{
type: "file",
file: {
filename: "stt_multispeaker.mp3",
file_data: "https://r2public.jigsawstack.com/interfaze/examples/stt_multispeaker.mp3",
},
},
],
},
],
response_format: zodResponseFormat(DiarizationSchema, "diarization_schema"),
});
console.log(response.choices[0].message.content);
//@ts-expect-error precontext is not typed
const precontext = response.precontext;
console.log("STT Results:", precontext?.[0]?.result);JSON output
{
"object": {
"full_text": "Who interviewed you at YC? We actually had two interviews. So YC generally does one interview. We had two. So total seven or eight people interviewed me. And I remember five or six of them. So Paul Buchheit, who's the creator of Gmail, John Levy, who's YC's general counsel. There were two more people with them in the first group who interviewed us. And it's like a rapid fire. They ask one question after the other, even if I have not finished answering or they feel they got the answer. It's super fast. That's first 10-minute interview. Then the second 10-minute interview, we had Jessica Livingston, who's also the co-founder of YC, Aaron Harris, Dalton Caldwell. Two separate 10-minute interviews and group interviews. What I've read about YC is they're underwriting the founder, which makes all the sense in the world. Because you're going to learn, you're going to discover, you're going to have to pivot. And what they're trying to find is like, hey, does this person have the potential to be the next Mark Zuckerberg? In terms of like disposition, grit, mentality. What do you think that they were trying to get at with those interviews? What qualities were they trying to uncover or measure? I think they just want to understand founders' tenacity. the mindset of execution and how real the founders are. It's very counterintuitive, but a ten minute rapid fire interview gives you a lot more insight into the founder than a very long one hour conversation. What they're trying to understand is how well this person know what they are currently building. So they're testing for subject matter, not just tenacity. But how do they test on subject matter if they themselves don't come from the industry? All YC partners review applications before their interview. So the process is you do a written application, you get selected for an interview, and then the 10-minute interview happens. So before the interview, all the partners who are interviewing you have read your application. So even if they don't have the expertise, they are very insightful and sharp at asking the right questions. In our case, I still remember this one particular question where they're talking about competitors and who still is going to compete with what's the largest company that still can hope to become what's the analogy. And I said, Lending Club at the time, it had IPO at 10 billion valuation or something like that. And it's like, okay, Lending Club, like what's the valuation of the company? And I said, X is the valuation, Y is the stock price, so on and so forth. Paul, who I still remember, opened the computer and checked LendingClub's valuation and the stock price. And I was within two or three percent of what it was at that day. I checked you on the spot. And I could not have prepared for that. It was nowhere mentioned in our application or anywhere else. And I did not expect them to ask me about LendingClub's stock price. You did that interview solo or did you have a co-founder join you in that? I had my co-founder, CTO, join me in that, and we had prepared like crazy for that interview. We write all the possible questions that Y Combinator can ask us. We write answers, airtight answers to those questions. As in, if I give you an answer, it should answer the question and it should not result in another question. And it has to be done within 15 to 30 seconds. It's concise. You got to be concise, precise on target and answer at the appropriate level. Not too much, not too little. Within 30 seconds, because they'll just cut you off after 30 seconds. They have 10 minutes, mind you. If you do 10-minute interview, 30-second question answer each, that's still a lot of questions they can get through. So their goal at that valuation is to kind of remove duds, number one, right? They want to avoid false positives. They want to do a lot of checks though, I would say, at that valuation, especially if you have traction. On the other hand, the selectivity rate for Y Combinator is very low. Yeah, I don't know what it's now, but it's about, it was 1.2, 1.5%. More than Harvard. Yeah, for people like me who are immigrants who are applying for O1, Extraordinary Ability, and all these different types of visas, Y Combinator is more selective than Harvard.",
"chunks": [
{
"speaker_id": "SPEAKER_01",
"text": "Who interviewed you at YC?",
"start_time": 0,
"end_time": 1.36
},
{
"speaker_id": "SPEAKER_00",
"text": "We actually had two interviews.",
"start_time": 1.36,
"end_time": 2.56
},
{
"speaker_id": "SPEAKER_00",
"text": "So YC generally does one interview.",
"start_time": 2.56,
"end_time": 4.44
},
{
"speaker_id": "SPEAKER_00",
"text": "We had two.",
"start_time": 4.44,
"end_time": 5.52
},
{
"speaker_id": "SPEAKER_00",
"text": "So total seven or eight people interviewed me.",
"start_time": 5.52,
"end_time": 8.16
},
{
"speaker_id": "SPEAKER_00",
"text": "And I remember five or six of them.",
"start_time": 8.48,
"end_time": 10.08
},
{
"speaker_id": "SPEAKER_00",
"text": "So Paul Buchheit, who's the creator of Gmail,",
"start_time": 10.08,
"end_time": 12.36
},
{
"speaker_id": "SPEAKER_00",
"text": "John Levy, who's YC's general counsel.",
"start_time": 12.72,
"end_time": 15.48
},
{
"speaker_id": "SPEAKER_00",
"text": "There were two more people with them in the first group who interviewed us.",
"start_time": 15.74,
"end_time": 19.66
},
{
"speaker_id": "SPEAKER_00",
"text": "And it's like a rapid fire.",
"start_time": 19.68,
"end_time": 20.98
},
{
"speaker_id": "SPEAKER_00",
"text": "They ask one question after the other, even if I have not finished answering or they feel",
"start_time": 21.12,
"end_time": 24.96
},
{
"speaker_id": "SPEAKER_00",
"text": "they got the answer.",
"start_time": 24.96,
"end_time": 25.7
},
{
"speaker_id": "SPEAKER_00",
"text": "It's super fast.",
"start_time": 25.82,
"end_time": 26.62
},
{
"speaker_id": "SPEAKER_00",
"text": "That's first 10-minute interview.",
"start_time": 26.74,
"end_time": 28.06
},
{
"speaker_id": "SPEAKER_00",
"text": "Then the second 10-minute interview, we had Jessica Livingston, who's also the co-founder",
"start_time": 28.26,
"end_time": 32.52
},
{
"speaker_id": "SPEAKER_00",
"text": "of YC, Aaron Harris, Dalton Caldwell.",
"start_time": 32.52,
"end_time": 35.74
},
{
"speaker_id": "SPEAKER_01",
"text": "Two separate 10-minute interviews and group interviews.",
"start_time": 35.98,
"end_time": 40.68
},
{
"speaker_id": "SPEAKER_01",
"text": "What I've read about YC is they're underwriting the",
"start_time": 41.04,
"end_time": 43.78
},
{
"speaker_id": "SPEAKER_01",
"text": "founder, which makes all the sense in the world.",
"start_time": 43.78,
"end_time": 46.52
},
{
"speaker_id": "SPEAKER_01",
"text": "Because you're going to learn, you're going to discover, you're going to have to pivot.",
"start_time": 46.6,
"end_time": 49.24
},
{
"speaker_id": "SPEAKER_01",
"text": "And what they're trying to find is like, hey, does this",
"start_time": 49.96,
"end_time": 52.86
},
{
"speaker_id": "SPEAKER_01",
"text": "person have the potential to be the next Mark Zuckerberg?",
"start_time": 52.86,
"end_time": 55.76
},
{
"speaker_id": "SPEAKER_01",
"text": "In terms of like disposition, grit, mentality.",
"start_time": 55.84,
"end_time": 58.68
},
{
"speaker_id": "SPEAKER_01",
"text": "What do you think that they were trying to get at with those interviews?",
"start_time": 58.96,
"end_time": 62
},
{
"speaker_id": "SPEAKER_01",
"text": "What qualities were they trying to uncover or measure?",
"start_time": 62.12,
"end_time": 65.18
},
{
"speaker_id": "SPEAKER_00",
"text": "I think they just want to understand founders' tenacity.",
"start_time": 65.18,
"end_time": 69.4
},
{
"speaker_id": "SPEAKER_00",
"text": "the mindset of execution and how real the founders are.",
"start_time": 70.37,
"end_time": 73.91
},
{
"speaker_id": "SPEAKER_00",
"text": "It's very counterintuitive, but a ten minute rapid fire interview",
"start_time": 73.97,
"end_time": 78.41
},
{
"speaker_id": "SPEAKER_00",
"text": "gives you a lot more insight into the founder",
"start_time": 78.41,
"end_time": 82.31
},
{
"speaker_id": "SPEAKER_00",
"text": "than a very long one hour conversation.",
"start_time": 82.57,
"end_time": 85.03
},
{
"speaker_id": "SPEAKER_00",
"text": "What they're trying to understand is how well this person know",
"start_time": 85.31,
"end_time": 88.67
},
{
"speaker_id": "SPEAKER_00",
"text": "what they are currently building.",
"start_time": 88.71,
"end_time": 90.73
},
{
"speaker_id": "SPEAKER_01",
"text": "So they're testing for subject matter, not just tenacity.",
"start_time": 90.73,
"end_time": 94.07
},
{
"speaker_id": "SPEAKER_01",
"text": "But how do they test on subject matter if they themselves don't come",
"start_time": 94.17,
"end_time": 97.41
},
{
"speaker_id": "SPEAKER_01",
"text": "from the industry?",
"start_time": 97.41,
"end_time": 98.23
},
{
"speaker_id": "SPEAKER_00",
"text": "All YC partners review applications before their interview.",
"start_time": 98.48,
"end_time": 102.96
},
{
"speaker_id": "SPEAKER_00",
"text": "So the process is you do a written application,",
"start_time": 103.06,
"end_time": 106.96
},
{
"speaker_id": "SPEAKER_00",
"text": "you get selected for an interview,",
"start_time": 107.26,
"end_time": 108.62
},
{
"speaker_id": "SPEAKER_00",
"text": "and then the 10-minute interview happens.",
"start_time": 109.12,
"end_time": 111.02
},
{
"speaker_id": "SPEAKER_00",
"text": "So before the interview,",
"start_time": 111.12,
"end_time": 112.2
},
{
"speaker_id": "SPEAKER_00",
"text": "all the partners who are interviewing you",
"start_time": 112.6,
"end_time": 114.56
},
{
"speaker_id": "SPEAKER_00",
"text": "have read your application. So even if they don't",
"start_time": 114.72,
"end_time": 117.8
},
{
"speaker_id": "SPEAKER_00",
"text": "have the expertise, they are very insightful and",
"start_time": 117.8,
"end_time": 120.88
},
{
"speaker_id": "SPEAKER_00",
"text": "sharp at asking the right questions. In our",
"start_time": 120.88,
"end_time": 123.48
},
{
"speaker_id": "SPEAKER_00",
"text": "case, I still remember this one particular question",
"start_time": 123.48,
"end_time": 126.08
},
{
"speaker_id": "SPEAKER_00",
"text": "where they're talking about competitors and who still is going to compete with what's the largest",
"start_time": 126.08,
"end_time": 130.3
},
{
"speaker_id": "SPEAKER_00",
"text": "company that still can hope to become what's the",
"start_time": 130.3,
"end_time": 132.83
},
{
"speaker_id": "SPEAKER_00",
"text": "analogy. And I said, Lending Club at the time,",
"start_time": 132.83,
"end_time": 135.36
},
{
"speaker_id": "SPEAKER_00",
"text": "it had IPO at 10 billion valuation or something like that. And it's like, okay, Lending Club,",
"start_time": 135.44,
"end_time": 140.2
},
{
"speaker_id": "SPEAKER_00",
"text": "like what's the valuation of the company? And I said, X is the valuation, Y is the",
"start_time": 140.2,
"end_time": 144.02
},
{
"speaker_id": "SPEAKER_00",
"text": "stock price, so on and so forth.",
"start_time": 144.05,
"end_time": 145.79
},
{
"speaker_id": "SPEAKER_00",
"text": "Paul, who I still remember, opened the computer and checked LendingClub's valuation and the stock price.",
"start_time": 145.79,
"end_time": 150.59
},
{
"speaker_id": "SPEAKER_00",
"text": "And I was within two or three percent of what it was at that day.",
"start_time": 150.59,
"end_time": 154.25
},
{
"speaker_id": "SPEAKER_01",
"text": "I checked you on the spot.",
"start_time": 154.25,
"end_time": 155.87
},
{
"speaker_id": "SPEAKER_00",
"text": "And I could not have prepared for that.",
"start_time": 155.87,
"end_time": 157.69
},
{
"speaker_id": "SPEAKER_00",
"text": "It was nowhere mentioned in our application or anywhere else.",
"start_time": 157.69,
"end_time": 160.99
},
{
"speaker_id": "SPEAKER_00",
"text": "And I did not expect them to ask me about LendingClub's stock price.",
"start_time": 160.99,
"end_time": 163.93
},
{
"speaker_id": "SPEAKER_01",
"text": "You did that interview solo or did you have a co-founder join you in that?",
"start_time": 163.93,
"end_time": 167.55
},
{
"speaker_id": "SPEAKER_00",
"text": "I had my co-founder, CTO, join me in that,",
"start_time": 167.74,
"end_time": 170.58
},
{
"speaker_id": "SPEAKER_00",
"text": "and we had prepared like crazy for that interview.",
"start_time": 170.58,
"end_time": 173.42
},
{
"speaker_id": "SPEAKER_00",
"text": "We write all the possible questions that Y",
"start_time": 173.42,
"end_time": 176.14
},
{
"speaker_id": "SPEAKER_00",
"text": "Combinator can ask us. We write answers,",
"start_time": 176.14,
"end_time": 178.86
},
{
"speaker_id": "SPEAKER_00",
"text": "airtight answers to those questions. As in, if I",
"start_time": 178.86,
"end_time": 181.46
},
{
"speaker_id": "SPEAKER_00",
"text": "give you an answer, it should answer the question",
"start_time": 181.46,
"end_time": 184.06
},
{
"speaker_id": "SPEAKER_00",
"text": "and it should not result in another question. And it",
"start_time": 184.06,
"end_time": 186.7
},
{
"speaker_id": "SPEAKER_00",
"text": "has to be done within 15 to 30 seconds.",
"start_time": 186.7,
"end_time": 189.34
},
{
"speaker_id": "SPEAKER_01",
"text": "It's concise. You got to be concise, precise",
"start_time": 190.22,
"end_time": 193.22
},
{
"speaker_id": "SPEAKER_01",
"text": "on target and answer at the appropriate level.",
"start_time": 193.22,
"end_time": 196.22
},
{
"speaker_id": "SPEAKER_01",
"text": "Not too much, not too little.",
"start_time": 196.74,
"end_time": 198.5
},
{
"speaker_id": "SPEAKER_00",
"text": "Within 30 seconds, because they'll just cut you off after 30 seconds.",
"start_time": 199.08,
"end_time": 202.28
},
{
"speaker_id": "SPEAKER_00",
"text": "They have 10 minutes, mind you.",
"start_time": 202.34,
"end_time": 203.48
},
{
"speaker_id": "SPEAKER_00",
"text": "If you do 10-minute interview, 30-second question answer each, that's",
"start_time": 203.56,
"end_time": 206.49
},
{
"speaker_id": "SPEAKER_00",
"text": "still a lot of questions they can get through.",
"start_time": 206.49,
"end_time": 209.42
},
{
"speaker_id": "SPEAKER_01",
"text": "So their goal at that valuation is to",
"start_time": 209.42,
"end_time": 212.55
},
{
"speaker_id": "SPEAKER_01",
"text": "kind of remove duds, number one, right?",
"start_time": 212.55,
"end_time": 215.68
},
{
"speaker_id": "SPEAKER_01",
"text": "They want to avoid false positives.",
"start_time": 215.74,
"end_time": 217.36
},
{
"speaker_id": "SPEAKER_01",
"text": "They want to do a lot of checks though, I would say, at that valuation, especially if you have traction.",
"start_time": 219.76,
"end_time": 224.08
},
{
"speaker_id": "SPEAKER_01",
"text": "On the other hand, the selectivity rate for Y Combinator is very low.",
"start_time": 224.08,
"end_time": 227.2
},
{
"speaker_id": "SPEAKER_00",
"text": "Yeah, I don't know what it's now, but it's about, it was 1.2, 1.5%.",
"start_time": 228.13,
"end_time": 232.11
},
{
"speaker_id": "SPEAKER_01",
"text": "More than Harvard.",
"start_time": 232.81,
"end_time": 233.73
},
{
"speaker_id": "SPEAKER_00",
"text": "Yeah, for people like me who are immigrants who are applying for O1, Extraordinary Ability,",
"start_time": 234.07,
"end_time": 237.89
},
{
"speaker_id": "SPEAKER_00",
"text": "and all these different types of visas, Y Combinator is more selective than Harvard.",
"start_time": 237.89,
"end_time": 241.71
}
],
"number_of_speakers": 2
},
"finishReason": "stop",
"usage": {
"inputTokens": 26600,
"outputTokens": 26264,
"totalTokens": 52864
},
"response": {
"id": "interfaze-1775094905146",
"modelId": "interfaze-beta",
"body": {
"id": "interfaze-1775094905146",
"object": "chat.completion",
"model": "interfaze-beta",
"usage": {
"prompt_tokens": 26600,
"completion_tokens": 26264,
"total_tokens": 52864
},
"precontext": [
{
"name": "stt",
"result": {
"text": "Who interviewed you at YC? We actually had two interviews. So YC generally does one interview. We had two. So total seven or eight people interviewed me. And I remember five or six of them. So Paul Buchheit, who's the creator of Gmail, John Levy, who's YC's general counsel. There were two more people with them in the first group who interviewed us. And it's like a rapid fire. They ask one question after the other, even if I have not finished answering or they feel they got the answer. It's super fast. That's first 10-minute interview. Then the second 10-minute interview, we had Jessica Livingston, who's also the co-founder of YC, Aaron Harris, Dalton Caldwell. Two separate 10-minute interviews and group interviews. What I've read about YC is they're underwriting the founder, which makes all the sense in the world. Because you're going to learn, you're going to discover, you're going to have to pivot. And what they're trying to find is like, hey, does this person have the potential to be the next Mark Zuckerberg? In terms of like disposition, grit, mentality. What do you think that they were trying to get at with those interviews? What qualities were they trying to uncover or measure? I think they just want to understand founders' tenacity. the mindset of execution and how real the founders are. It's very counterintuitive, but a ten minute rapid fire interview gives you a lot more insight into the founder than a very long one hour conversation. What they're trying to understand is how well this person know what they are currently building. So they're testing for subject matter, not just tenacity. But how do they test on subject matter if they themselves don't come from the industry? All YC partners review applications before their interview. So the process is you do a written application, you get selected for an interview, and then the 10-minute interview happens. So before the interview, all the partners who are interviewing you have read your application. So even if they don't have the expertise, they are very insightful and sharp at asking the right questions. In our case, I still remember this one particular question where they're talking about competitors and who still is going to compete with what's the largest company that still can hope to become what's the analogy. And I said, Lending Club at the time, it had IPO at 10 billion valuation or something like that. And it's like, okay, Lending Club, like what's the valuation of the company? And I said, X is the valuation, Y is the stock price, so on and so forth. Paul, who I still remember, opened the computer and checked LendingClub's valuation and the stock price. And I was within two or three percent of what it was at that day. I checked you on the spot. And I could not have prepared for that. It was nowhere mentioned in our application or anywhere else. And I did not expect them to ask me about LendingClub's stock price. You did that interview solo or did you have a co-founder join you in that? I had my co-founder, CTO, join me in that, and we had prepared like crazy for that interview. We write all the possible questions that Y Combinator can ask us. We write answers, airtight answers to those questions. As in, if I give you an answer, it should answer the question and it should not result in another question. And it has to be done within 15 to 30 seconds. It's concise. You got to be concise, precise on target and answer at the appropriate level. Not too much, not too little. Within 30 seconds, because they'll just cut you off after 30 seconds. They have 10 minutes, mind you. If you do 10-minute interview, 30-second question answer each, that's still a lot of questions they can get through. So their goal at that valuation is to kind of remove duds, number one, right? They want to avoid false positives. They want to do a lot of checks though, I would say, at that valuation, especially if you have traction. On the other hand, the selectivity rate for Y Combinator is very low. Yeah, I don't know what it's now, but it's about, it was 1.2, 1.5%. More than Harvard. Yeah, for people like me who are immigrants who are applying for O1, Extraordinary Ability, and all these different types of visas, Y Combinator is more selective than Harvard.",
"chunks": [
{
"timestamp": [0, 1.36],
"text": "Who interviewed you at YC?",
"speaker": "SPEAKER_01"
},
{
"timestamp": [1.36, 2.56],
"text": "We actually had two interviews.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [2.56, 4.44],
"text": "So YC generally does one interview.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [4.44, 5.52],
"text": "We had two.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [5.52, 8.16],
"text": "So total seven or eight people interviewed me.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [8.48, 10.08],
"text": "And I remember five or six of them.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [10.08, 12.36],
"text": "So Paul Buchheit, who's the creator of Gmail,",
"speaker": "SPEAKER_00"
},
{
"timestamp": [12.72, 15.48],
"text": "John Levy, who's YC's general counsel.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [15.74, 19.66],
"text": "There were two more people with them in the first group who interviewed us.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [19.68, 20.98],
"text": "And it's like a rapid fire.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [21.12, 24.96],
"text": "They ask one question after the other, even if I have not finished answering or they feel",
"speaker": "SPEAKER_00"
},
{
"timestamp": [24.96, 25.7],
"text": "they got the answer.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [25.82, 26.62],
"text": "It's super fast.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [26.74, 28.06],
"text": "That's first 10-minute interview.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [28.26, 32.52],
"text": "Then the second 10-minute interview, we had Jessica Livingston, who's also the co-founder",
"speaker": "SPEAKER_00"
},
{
"timestamp": [32.52, 35.74],
"text": "of YC, Aaron Harris, Dalton Caldwell.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [35.98, 40.68],
"text": "Two separate 10-minute interviews and group interviews.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [41.04, 43.78],
"text": "What I've read about YC is they're underwriting the",
"speaker": "SPEAKER_01"
},
{
"timestamp": [43.78, 46.52],
"text": "founder, which makes all the sense in the world.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [46.6, 49.24],
"text": "Because you're going to learn, you're going to discover, you're going to have to pivot.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [49.96, 52.86],
"text": "And what they're trying to find is like, hey, does this",
"speaker": "SPEAKER_01"
},
{
"timestamp": [52.86, 55.76],
"text": "person have the potential to be the next Mark Zuckerberg?",
"speaker": "SPEAKER_01"
},
{
"timestamp": [55.84, 58.68],
"text": "In terms of like disposition, grit, mentality.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [58.96, 62],
"text": "What do you think that they were trying to get at with those interviews?",
"speaker": "SPEAKER_01"
},
{
"timestamp": [62.12, 65.18],
"text": "What qualities were they trying to uncover or measure?",
"speaker": "SPEAKER_01"
},
{
"timestamp": [65.18, 69.4],
"text": "I think they just want to understand founders' tenacity.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [70.37, 73.91],
"text": "the mindset of execution and how real the founders are.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [73.97, 78.41],
"text": "It's very counterintuitive, but a ten minute rapid fire interview",
"speaker": "SPEAKER_00"
},
{
"timestamp": [78.41, 82.31],
"text": "gives you a lot more insight into the founder",
"speaker": "SPEAKER_00"
},
{
"timestamp": [82.57, 85.03],
"text": "than a very long one hour conversation.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [85.31, 88.67],
"text": "What they're trying to understand is how well this person know",
"speaker": "SPEAKER_00"
},
{
"timestamp": [88.71, 90.73],
"text": "what they are currently building.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [90.73, 94.07],
"text": "So they're testing for subject matter, not just tenacity.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [94.17, 97.41],
"text": "But how do they test on subject matter if they themselves don't come",
"speaker": "SPEAKER_01"
},
{
"timestamp": [97.41, 98.23],
"text": "from the industry?",
"speaker": "SPEAKER_01"
},
{
"timestamp": [98.48, 102.96],
"text": "All YC partners review applications before their interview.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [103.06, 106.96],
"text": "So the process is you do a written application,",
"speaker": "SPEAKER_00"
},
{
"timestamp": [107.26, 108.62],
"text": "you get selected for an interview,",
"speaker": "SPEAKER_00"
},
{
"timestamp": [109.12, 111.02],
"text": "and then the 10-minute interview happens.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [111.12, 112.2],
"text": "So before the interview,",
"speaker": "SPEAKER_00"
},
{
"timestamp": [112.6, 114.56],
"text": "all the partners who are interviewing you",
"speaker": "SPEAKER_00"
},
{
"timestamp": [114.72, 117.8],
"text": "have read your application. So even if they don't",
"speaker": "SPEAKER_00"
},
{
"timestamp": [117.8, 120.88],
"text": "have the expertise, they are very insightful and",
"speaker": "SPEAKER_00"
},
{
"timestamp": [120.88, 123.48],
"text": "sharp at asking the right questions. In our",
"speaker": "SPEAKER_00"
},
{
"timestamp": [123.48, 126.08],
"text": "case, I still remember this one particular question",
"speaker": "SPEAKER_00"
},
{
"timestamp": [126.08, 130.3],
"text": "where they're talking about competitors and who still is going to compete with what's the largest",
"speaker": "SPEAKER_00"
},
{
"timestamp": [130.3, 132.83],
"text": "company that still can hope to become what's the",
"speaker": "SPEAKER_00"
},
{
"timestamp": [132.83, 135.36],
"text": "analogy. And I said, Lending Club at the time,",
"speaker": "SPEAKER_00"
},
{
"timestamp": [135.44, 140.2],
"text": "it had IPO at 10 billion valuation or something like that. And it's like, okay, Lending Club,",
"speaker": "SPEAKER_00"
},
{
"timestamp": [140.2, 144.02],
"text": "like what's the valuation of the company? And I said, X is the valuation, Y is the",
"speaker": "SPEAKER_00"
},
{
"timestamp": [144.05, 145.79],
"text": "stock price, so on and so forth.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [145.79, 150.59],
"text": "Paul, who I still remember, opened the computer and checked LendingClub's valuation and the stock price.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [150.59, 154.25],
"text": "And I was within two or three percent of what it was at that day.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [154.25, 155.87],
"text": "I checked you on the spot.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [155.87, 157.69],
"text": "And I could not have prepared for that.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [157.69, 160.99],
"text": "It was nowhere mentioned in our application or anywhere else.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [160.99, 163.93],
"text": "And I did not expect them to ask me about LendingClub's stock price.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [163.93, 167.55],
"text": "You did that interview solo or did you have a co-founder join you in that?",
"speaker": "SPEAKER_01"
},
{
"timestamp": [167.74, 170.58],
"text": "I had my co-founder, CTO, join me in that,",
"speaker": "SPEAKER_00"
},
{
"timestamp": [170.58, 173.42],
"text": "and we had prepared like crazy for that interview.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [173.42, 176.14],
"text": "We write all the possible questions that Y",
"speaker": "SPEAKER_00"
},
{
"timestamp": [176.14, 178.86],
"text": "Combinator can ask us. We write answers,",
"speaker": "SPEAKER_00"
},
{
"timestamp": [178.86, 181.46],
"text": "airtight answers to those questions. As in, if I",
"speaker": "SPEAKER_00"
},
{
"timestamp": [181.46, 184.06],
"text": "give you an answer, it should answer the question",
"speaker": "SPEAKER_00"
},
{
"timestamp": [184.06, 186.7],
"text": "and it should not result in another question. And it",
"speaker": "SPEAKER_00"
},
{
"timestamp": [186.7, 189.34],
"text": "has to be done within 15 to 30 seconds.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [190.22, 193.22],
"text": "It's concise. You got to be concise, precise",
"speaker": "SPEAKER_01"
},
{
"timestamp": [193.22, 196.22],
"text": "on target and answer at the appropriate level.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [196.74, 198.5],
"text": "Not too much, not too little.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [199.08, 202.28],
"text": "Within 30 seconds, because they'll just cut you off after 30 seconds.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [202.34, 203.48],
"text": "They have 10 minutes, mind you.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [203.56, 206.49],
"text": "If you do 10-minute interview, 30-second question answer each, that's",
"speaker": "SPEAKER_00"
},
{
"timestamp": [206.49, 209.42],
"text": "still a lot of questions they can get through.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [209.42, 212.55],
"text": "So their goal at that valuation is to",
"speaker": "SPEAKER_01"
},
{
"timestamp": [212.55, 215.68],
"text": "kind of remove duds, number one, right?",
"speaker": "SPEAKER_01"
},
{
"timestamp": [215.74, 217.36],
"text": "They want to avoid false positives.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [219.76, 224.08],
"text": "They want to do a lot of checks though, I would say, at that valuation, especially if you have traction.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [224.08, 227.2],
"text": "On the other hand, the selectivity rate for Y Combinator is very low.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [228.13, 232.11],
"text": "Yeah, I don't know what it's now, but it's about, it was 1.2, 1.5%.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [232.81, 233.73],
"text": "More than Harvard.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [234.07, 237.89],
"text": "Yeah, for people like me who are immigrants who are applying for O1, Extraordinary Ability,",
"speaker": "SPEAKER_00"
},
{
"timestamp": [237.89, 241.71],
"text": "and all these different types of visas, Y Combinator is more selective than Harvard.",
"speaker": "SPEAKER_00"
}
]
}
}
]
}
}
}OpenAI SDK
Vercel AI SDK
LangChain SDK
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const SentimentSchema = z.object({
full_text: z.string(),
chunks: z.array(
z.object({
speaker_id: z.string(),
text: z.string(),
sentiment: z.enum(["positive", "negative", "neutral"]).describe("sentiment of the audio chunk"),
start_time: z.number(),
end_time: z.number(),
})
),
number_of_speakers: z.number(),
});
const response = await interfaze.chat.completions.create({
model: "interfaze-beta",
messages: [
{
role: "user",
content: [
{ type: "text", text: "Transcribe the audio file, identify the speakers, and analyze the sentiment of each speaker" },
{
type: "file",
file: {
filename: "stt_call.mp3",
file_data: "https://r2public.jigsawstack.com/interfaze/examples/stt_call.mp3",
},
},
],
},
],
response_format: zodResponseFormat(SentimentSchema, "sentiment_schema"),
});
console.log(response.choices[0].message.content);
//@ts-expect-error precontext is not typed
const precontext = response.precontext;
console.log("STT Results:", precontext?.[0]?.result);JSON output
{
"object": {
"full_text": "Hi, thank you so much for calling Wham. My name is Allie. How can I help you today? Hey, just trying to change the payment info on the website since my sub is about to renew. I was wondering if you could do it on the phone. Yeah, that shouldn't be a problem. Could I have your first and last name, please? Aaron Schertz, S-C-H-E-R-T-Z. All right. Thank you so much, Aaron. Could I also have your phone number, please? 713-899-0745. Excellent. And I'm just going to verify with your security question really quick, if you don't mind. What street did you grow up on? Cypress Avenue. Okay. Excellent. Thank you so much. And I can just go ahead and update that card info for you. So first, what is the card number? 4708. Okay. 1209. Okay. 8732. Uh-huh. 7655. Great. And could I also have your expiration date as well? February 2028. Great. And could I also have your CVC on the back? 482 Okay, thank you And the billing address is still the same? Yep Okay, great Then you're all set Thank you so much for calling in",
"chunks": [
{
"speaker_id": "SPEAKER_00",
"text": "Hi, thank you so much for calling Wham. My name is Allie. How can I help you today?",
"sentiment": "neutral",
"start_time": 0,
"end_time": 5
},
{
"speaker_id": "SPEAKER_01",
"text": "Hey, just trying to change the payment info on",
"sentiment": "neutral",
"start_time": 5,
"end_time": 8.5
},
{
"speaker_id": "SPEAKER_01",
"text": "the website since my sub is about to renew.",
"sentiment": "neutral",
"start_time": 8.5,
"end_time": 12
},
{
"speaker_id": "SPEAKER_01",
"text": "I was wondering if you could do it on the phone.",
"sentiment": "neutral",
"start_time": 12,
"end_time": 14
},
{
"speaker_id": "SPEAKER_00",
"text": "Yeah, that shouldn't be a problem. Could",
"sentiment": "positive",
"start_time": 14,
"end_time": 17.5
},
{
"speaker_id": "SPEAKER_00",
"text": "I have your first and last name, please?",
"sentiment": "neutral",
"start_time": 17.5,
"end_time": 21
},
{
"speaker_id": "SPEAKER_01",
"text": "Aaron Schertz, S-C-H-E-R-T-Z.",
"sentiment": "neutral",
"start_time": 21,
"end_time": 26
},
{
"speaker_id": "SPEAKER_00",
"text": "All right. Thank you so much, Aaron.",
"sentiment": "positive",
"start_time": 26,
"end_time": 28.88
},
{
"speaker_id": "SPEAKER_00",
"text": "Could I also have your phone number, please?",
"sentiment": "neutral",
"start_time": 28.88,
"end_time": 31.76
},
{
"speaker_id": "SPEAKER_01",
"text": "713-899-0745.",
"sentiment": "neutral",
"start_time": 32.58,
"end_time": 36.16
},
{
"speaker_id": "SPEAKER_00",
"text": "Excellent. And I'm just going to verify with",
"sentiment": "positive",
"start_time": 37.62,
"end_time": 40.32
},
{
"speaker_id": "SPEAKER_00",
"text": "your security question really quick, if you don't mind.",
"sentiment": "neutral",
"start_time": 40.32,
"end_time": 43.02
},
{
"speaker_id": "SPEAKER_00",
"text": "What street did you grow up on?",
"sentiment": "neutral",
"start_time": 43.62,
"end_time": 45.32
},
{
"speaker_id": "SPEAKER_01",
"text": "Cypress Avenue.",
"sentiment": "neutral",
"start_time": 45.86,
"end_time": 46.86
},
{
"speaker_id": "SPEAKER_00",
"text": "Okay. Excellent. Thank you so much. And I can",
"sentiment": "positive",
"start_time": 47.98,
"end_time": 51.26
},
{
"speaker_id": "SPEAKER_00",
"text": "just go ahead and update that card info for you.",
"sentiment": "neutral",
"start_time": 51.26,
"end_time": 54.54
},
{
"speaker_id": "SPEAKER_00",
"text": "So first, what is the card number?",
"sentiment": "neutral",
"start_time": 56,
"end_time": 58.74
},
{
"speaker_id": "SPEAKER_01",
"text": "4708.",
"sentiment": "neutral",
"start_time": 61.36,
"end_time": 62.04
},
{
"speaker_id": "SPEAKER_00",
"text": "Okay.",
"sentiment": "neutral",
"start_time": 62.78,
"end_time": 63.48
},
{
"speaker_id": "SPEAKER_01",
"text": "1209.",
"sentiment": "neutral",
"start_time": 64.56,
"end_time": 65.24
},
{
"speaker_id": "SPEAKER_00",
"text": "Okay.",
"sentiment": "neutral",
"start_time": 65.62,
"end_time": 66.14
},
{
"speaker_id": "SPEAKER_01",
"text": "8732.",
"sentiment": "neutral",
"start_time": 67.22,
"end_time": 67.94
},
{
"speaker_id": "SPEAKER_00",
"text": "Uh-huh.",
"sentiment": "neutral",
"start_time": 68.74,
"end_time": 69.06
},
{
"speaker_id": "SPEAKER_01",
"text": "7655.",
"sentiment": "neutral",
"start_time": 70.22,
"end_time": 70.86
},
{
"speaker_id": "SPEAKER_00",
"text": "Great.",
"sentiment": "positive",
"start_time": 72.26,
"end_time": 72.94
},
{
"speaker_id": "SPEAKER_00",
"text": "And could I also have your expiration date as well?",
"sentiment": "neutral",
"start_time": 73.24,
"end_time": 76.38
},
{
"speaker_id": "SPEAKER_01",
"text": "February 2028.",
"sentiment": "neutral",
"start_time": 76.84,
"end_time": 78.42
},
{
"speaker_id": "SPEAKER_00",
"text": "Great.",
"sentiment": "positive",
"start_time": 79.84,
"end_time": 80.52
},
{
"speaker_id": "SPEAKER_00",
"text": "And could I also have your CVC on the back?",
"sentiment": "neutral",
"start_time": 80.92,
"end_time": 84.32
},
{
"speaker_id": "SPEAKER_01",
"text": "482",
"sentiment": "neutral",
"start_time": 84.32,
"end_time": 86.76
},
{
"speaker_id": "SPEAKER_00",
"text": "Okay, thank you",
"sentiment": "positive",
"start_time": 86.76,
"end_time": 89.14
},
{
"speaker_id": "SPEAKER_00",
"text": "And the billing address is still the same?",
"sentiment": "neutral",
"start_time": 89.14,
"end_time": 91.88
},
{
"speaker_id": "SPEAKER_01",
"text": "Yep",
"sentiment": "neutral",
"start_time": 92.32,
"end_time": 92.68
},
{
"speaker_id": "SPEAKER_00",
"text": "Okay, great",
"sentiment": "positive",
"start_time": 92.68,
"end_time": 94.2
},
{
"speaker_id": "SPEAKER_00",
"text": "Then you're all set",
"sentiment": "positive",
"start_time": 94.2,
"end_time": 96.66
},
{
"speaker_id": "SPEAKER_00",
"text": "Thank you so much for calling in",
"sentiment": "positive",
"start_time": 96.66,
"end_time": 98
}
],
"number_of_speakers": 2
},
"response": {
"id": "interfaze-1775096308523",
"modelId": "interfaze-beta",
"body": {
"id": "interfaze-1775096308523",
"object": "chat.completion",
"model": "interfaze-beta",
"usage": {
"prompt_tokens": 10958,
"completion_tokens": 21596,
"total_tokens": 32554
},
"precontext": [
{
"name": "stt",
"result": {
"text": "Hi, thank you so much for calling Wham. My name is Allie. How can I help you today? Hey, just trying to change the payment info on the website since my sub is about to renew. I was wondering if you could do it on the phone. Yeah, that shouldn't be a problem. Could I have your first and last name, please? Aaron Schertz, S-C-H-E-R-T-Z. All right. Thank you so much, Aaron. Could I also have your phone number, please? 713-899-0745. Excellent. And I'm just going to verify with your security question really quick, if you don't mind. What street did you grow up on? Cypress Avenue. Okay. Excellent. Thank you so much. And I can just go ahead and update that card info for you. So first, what is the card number? 4708. Okay. 1209. Okay. 8732. Uh-huh. 7655. Great. And could I also have your expiration date as well? February 2028. Great. And could I also have your CVC on the back? 482 Okay, thank you And the billing address is still the same? Yep Okay, great Then you're all set Thank you so much for calling in",
"chunks": [
{
"timestamp": [0, 5],
"text": "Hi, thank you so much for calling Wham. My name is Allie. How can I help you today?",
"speaker": "SPEAKER_00"
},
{
"timestamp": [5, 8.5],
"text": "Hey, just trying to change the payment info on",
"speaker": "SPEAKER_01"
},
{
"timestamp": [8.5, 12],
"text": "the website since my sub is about to renew.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [12, 14],
"text": "I was wondering if you could do it on the phone.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [14, 17.5],
"text": "Yeah, that shouldn't be a problem. Could",
"speaker": "SPEAKER_00"
},
{
"timestamp": [17.5, 21],
"text": "I have your first and last name, please?",
"speaker": "SPEAKER_00"
},
{
"timestamp": [21, 26],
"text": "Aaron Schertz, S-C-H-E-R-T-Z.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [26, 28.88],
"text": "All right. Thank you so much, Aaron.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [28.88, 31.76],
"text": "Could I also have your phone number, please?",
"speaker": "SPEAKER_00"
},
{
"timestamp": [32.58, 36.16],
"text": "713-899-0745.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [37.62, 40.32],
"text": "Excellent. And I'm just going to verify with",
"speaker": "SPEAKER_00"
},
{
"timestamp": [40.32, 43.02],
"text": "your security question really quick, if you don't mind.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [43.62, 45.32],
"text": "What street did you grow up on?",
"speaker": "SPEAKER_00"
},
{
"timestamp": [45.86, 46.86],
"text": "Cypress Avenue.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [47.98, 51.26],
"text": "Okay. Excellent. Thank you so much. And I can",
"speaker": "SPEAKER_00"
},
{
"timestamp": [51.26, 54.54],
"text": "just go ahead and update that card info for you.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [56, 58.74],
"text": "So first, what is the card number?",
"speaker": "SPEAKER_00"
},
{
"timestamp": [61.36, 62.04],
"text": "4708.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [62.78, 63.48],
"text": "Okay.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [64.56, 65.24],
"text": "1209.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [65.62, 66.14],
"text": "Okay.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [67.22, 67.94],
"text": "8732.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [68.74, 69.06],
"text": "Uh-huh.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [70.22, 70.86],
"text": "7655.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [72.26, 72.94],
"text": "Great.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [73.24, 76.38],
"text": "And could I also have your expiration date as well?",
"speaker": "SPEAKER_00"
},
{
"timestamp": [76.84, 78.42],
"text": "February 2028.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [79.84, 80.52],
"text": "Great.",
"speaker": "SPEAKER_00"
},
{
"timestamp": [80.92, 84.32],
"text": "And could I also have your CVC on the back?",
"speaker": "SPEAKER_00"
},
{
"timestamp": [84.32, 86.76],
"text": "482",
"speaker": "SPEAKER_01"
},
{
"timestamp": [86.76, 89.14],
"text": "Okay, thank you",
"speaker": "SPEAKER_00"
},
{
"timestamp": [89.14, 91.88],
"text": "And the billing address is still the same?",
"speaker": "SPEAKER_00"
},
{
"timestamp": [92.32, 92.68],
"text": "Yep",
"speaker": "SPEAKER_01"
},
{
"timestamp": [92.68, 94.2],
"text": "Okay, great",
"speaker": "SPEAKER_00"
},
{
"timestamp": [94.2, 96.66],
"text": "Then you're all set",
"speaker": "SPEAKER_00"
},
{
"timestamp": [96.66, 98],
"text": "Thank you so much for calling in",
"speaker": "SPEAKER_00"
}
]
}
}
]
}
},
"finishReason": "stop",
"usage": {
"inputTokens": 10958,
"outputTokens": 21596,
"totalTokens": 32554
}
}To get the best performance with long audio file is to use run task with the <task>speech_to_text</task> in the system prompt, this only activates a part of the model used for audio.
OpenAI SDK
Vercel AI SDK
LangChain SDK
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";
const response = await interfaze.chat.completions.create({
model: "interfaze-beta",
messages: [
{
role: "system",
content: "<task>speech_to_text</task>",
},
{
role: "user",
content: [
{ type: "text", text: "Transcribe and identify the speakers in the audio file https://r2public.jigsawstack.com/interfaze/examples/stt_long_audio_sample_3.mp3" },
],
},
],
response_format: zodResponseFormat(z.any(), "empty_schema"),
});
console.log(response.choices[0].message.content);This took 1m10s to transcribe and diarize a 1hr and 35min audio file.
JSON output
{
"object": {
"name": "speech_to_text",
"result": {
"text": "We don't teach leaders how to have uncomfortable conversations. We don't teach students how to have uncomfortable conversations. You tell me which is going to be more valuable for the rest of your life. How to have a difficult conversation or trigonometry. Described as a visionary thinker with a rare intellect. Multiple time best-selling author. Scientific. Every single one of us knows what we do. Some of us know how we do it. But very, very few of us can clearly articulate why we do what we do. And I think one of the reasons most of us don't know who we are is because we're making decisions that are inconsistent with that true cause, with that why. There's a great irony in all of this. I had what a lot of people would be considered a good life and yet didn't want to wake up and go to work anymore. Why? We cannot do this thing called career or life alone. We're just not that smart. We're not that strong. We're just not that good. For anyone who wants to be a better version of themselves, it's one of the best podcasts I've ever done. So without further ado, I'm Stephen Bartlett, and this is the Diary of a CEO USA edition. I hope nobody's listening, but if you are, then please keep this to yourself. Simon, my introduction to you was... this book start with why? And it hung on the walls of some of my offices around the world for a long time. And then my employees would come in after reading the book and evangelize about it. And it would come up in meetings and in discussions and in creative brainstorms, et cetera, over and over and over again. The question I wanted to ask you was, was there a point in your life where you'd felt like you drifted so far from your why that you realized the importance of it for the first time? that set me on the path to find it in the first place, to even articulate that idea. I had what a lot of people would be considered sort of a good life, as living the proverbial American dream. I quit my job to start my own business. The business was doing okay, made an okay living, had great clients, did good work. And yet... I'd lost my passion for that and didn't want to wake up and go to work anymore, which was embarrassing because superficially everything was just fine. I was pretending that I was happier, more in control, and more successful than I was or felt, which is quite frankly pretty draining and pretty dark. And it wasn't until a very, very close friend of mine came to me and said something's wrong. She was the first one to notice something. And I came clean and I sort of let it all out. It was that catharsis that sort of lifted this heavy weight off my shoulders. I was no longer alone. It was no longer a secret. And all of the energy that was previously going into lying, hiding, and faking now went into finding a solution. There was a confluence of events. All of these histories are perfectly neat and clean, and that's not really how it is or was. To compress it and oversimplify it, I made this discovery based on the biology of human decision-making that every single one of us knows what we do. Some of us know how we do it, but very, very few of us can clearly articulate why we do what we do. And I realized that was what I was missing. So to answer your question, yes, 100%. The realization of the why was my loss of it. And I realized...",
"chunks": [
{
"timestamp": [0, 2.12],
"text": "We don't teach leaders how to have uncomfortable conversations.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [2.38, 4.54],
"text": "We don't teach students how to have uncomfortable conversations.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [4.84, 7.36],
"text": "You tell me which is going to be more valuable for the rest of your life.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [7.48, 9.54],
"text": "How to have a difficult conversation or trigonometry.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [10.4, 13.08],
"text": "Described as a visionary thinker with a rare intellect.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [13.24, 14.88],
"text": "Multiple time best-selling author.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [15.66, 16.02],
"text": "Scientific.",
"speaker": "SPEAKER_01"
},
{
"timestamp": [16.48, 18.14],
"text": "Every single one of us knows what we do.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [18.42, 19.68],
"text": "Some of us know how we do it.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [19.8, 22.6],
"text": "But very, very few of us can clearly articulate why we do what we do.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [22.75, 25.31],
"text": "And I think one of the reasons most of us don't know who we are",
"speaker": "SPEAKER_02"
},
{
"timestamp": [25.31, 29.71],
"text": "is because we're making decisions that are inconsistent with that true cause, with that why.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [30.07, 32.23],
"text": "There's a great irony in all of this.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [33.11, 36.19],
"text": "I had what a lot of people would be considered a good life",
"speaker": "SPEAKER_02"
},
{
"timestamp": [36.19, 38.43],
"text": "and yet didn't want to wake up and go to work anymore.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [38.77, 39.07],
"text": "Why?",
"speaker": "SPEAKER_03"
},
{
"timestamp": [40.45, 45.32],
"text": "We cannot do this thing called career or life alone.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [45.32, 48.08],
"text": "We're just not that smart. We're not that strong. We're just not that good.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [48.42, 50.74],
"text": "For anyone who wants to be a better version of themselves,",
"speaker": "SPEAKER_02"
},
{
"timestamp": [55.6, 58.81],
"text": "it's one of the best podcasts I've ever",
"speaker": "SPEAKER_02"
},
{
"timestamp": [58.81, 62.01],
"text": "done. So without further ado, I'm Stephen Bartlett,",
"speaker": "SPEAKER_03"
},
{
"timestamp": [62.21, 65.02],
"text": "and this is the Diary of a CEO USA",
"speaker": "SPEAKER_03"
},
{
"timestamp": [65.02, 67.83],
"text": "edition. I hope nobody's listening, but if you are,",
"speaker": "SPEAKER_03"
},
{
"timestamp": [68.43, 72.84],
"text": "then please keep this",
"speaker": "SPEAKER_03"
},
{
"timestamp": [72.84, 77.25],
"text": "to yourself. Simon, my",
"speaker": "UNKNOWN"
},
{
"timestamp": [77.25, 81.66],
"text": "introduction to you was...",
"speaker": "SPEAKER_03"
},
{
"timestamp": [82.21, 86.37],
"text": "this book start with why? And it hung on the walls of some of my offices around the world",
"speaker": "SPEAKER_03"
},
{
"timestamp": [86.37, 90.91],
"text": "for a long time. And then my employees would come in after reading the book and evangelize about it.",
"speaker": "SPEAKER_03"
},
{
"timestamp": [90.99, 94.47],
"text": "And it would come up in meetings and in discussions and in creative brainstorms,",
"speaker": "SPEAKER_03"
},
{
"timestamp": [94.51, 97.99],
"text": "et cetera, over and over and over again. The question I wanted to ask you was,",
"speaker": "SPEAKER_03"
},
{
"timestamp": [98.55, 101.39],
"text": "was there a point in your life where you'd",
"speaker": "SPEAKER_03"
},
{
"timestamp": [101.39, 104.23],
"text": "felt like you drifted so far from your why",
"speaker": "SPEAKER_03"
},
{
"timestamp": [104.23, 107.61],
"text": "that you realized the importance of it for the first time?",
"speaker": "SPEAKER_03"
},
{
"timestamp": [112.02, 116.5],
"text": "that set me on the path to find it in the first place,",
"speaker": "SPEAKER_02"
},
{
"timestamp": [116.6, 119.02],
"text": "to even articulate that idea.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [120.76, 123.8],
"text": "I had what a lot of people would be considered",
"speaker": "SPEAKER_02"
},
{
"timestamp": [123.8, 126],
"text": "sort of a good life,",
"speaker": "SPEAKER_02"
},
{
"timestamp": [126.04, 128.16],
"text": "as living the proverbial American dream.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [128.78, 131.9],
"text": "I quit my job to start my own business.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [131.9, 135.6],
"text": "The business was doing okay, made an okay living,",
"speaker": "SPEAKER_02"
},
{
"timestamp": [135.82, 138.1],
"text": "had great clients, did good work.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [139.02, 139.9],
"text": "And yet...",
"speaker": "SPEAKER_02"
},
{
"timestamp": [139.58, 143.72],
"text": "I'd lost my passion for that and didn't want to wake up and go to work anymore,",
"speaker": "SPEAKER_02"
},
{
"timestamp": [144.02, 147.72],
"text": "which was embarrassing because superficially everything was just fine.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [148.54, 152.16],
"text": "I was pretending that I was happier, more in control,",
"speaker": "SPEAKER_02"
},
{
"timestamp": [152.54, 154.92],
"text": "and more successful than I was or felt,",
"speaker": "SPEAKER_02"
},
{
"timestamp": [155.8, 158.38],
"text": "which is quite frankly pretty draining and pretty dark.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [159.32, 161.42],
"text": "And it wasn't until a very, very close friend of mine came to me",
"speaker": "SPEAKER_02"
},
{
"timestamp": [161.42, 162.2],
"text": "and said something's wrong.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [162.28, 163.96],
"text": "She was the first one to notice something.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [164.44, 167.8],
"text": "And I came clean and I sort of let it all out.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [169.87, 174.67],
"text": "It was that catharsis that sort of lifted this heavy weight off my shoulders.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [174.99, 176.05],
"text": "I was no longer alone.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [176.19, 177.15],
"text": "It was no longer a secret.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [177.97, 181.13],
"text": "And all of the energy that was previously going into lying, hiding, and faking",
"speaker": "SPEAKER_02"
},
{
"timestamp": [181.13, 182.63],
"text": "now went into finding a solution.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [183.68, 185.5],
"text": "There was a confluence of events.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [185.5, 190],
"text": "All of these histories are perfectly neat and clean,",
"speaker": "SPEAKER_02"
},
{
"timestamp": [190.14, 192.28],
"text": "and that's not really how it is or was.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [193.6, 195.7],
"text": "To compress it and oversimplify it,",
"speaker": "SPEAKER_02"
},
{
"timestamp": [196, 198.56],
"text": "I made this discovery based on the biology of human decision-making",
"speaker": "SPEAKER_02"
},
{
"timestamp": [198.56, 200.3],
"text": "that every single one of us knows what we do.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [201, 202.38],
"text": "Some of us know how we do it,",
"speaker": "SPEAKER_02"
},
{
"timestamp": [202.7, 205.62],
"text": "but very, very few of us can clearly articulate why we do what we do.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [206, 207.98],
"text": "And I realized that was what I was missing.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [208.68, 212.08],
"text": "So to answer your question, yes, 100%.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [212.08, 214.52],
"text": "The realization of the why was my loss of it.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [215.2, 216.08],
"text": "And I realized...",
"speaker": "SPEAKER_02"
},
{
"timestamp": [215.94, 217.62],
"text": "I knew what I did and I was good at it.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [217.72, 220.08],
"text": "I knew how I was different or special or stood out from the crowd",
"speaker": "SPEAKER_02"
},
{
"timestamp": [220.08, 222.44],
"text": "and that was my differentiating value proposition",
"speaker": "SPEAKER_02"
},
{
"timestamp": [222.44, 224.04],
"text": "and I was articulate about it.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [224.32, 226.84],
"text": "But I couldn't tell you why I was waking out of bed every day to do it.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [227.38, 230.78],
"text": "And I would give some nonsense entrepreneur answer",
"speaker": "SPEAKER_02"
},
{
"timestamp": [230.78, 231.9],
"text": "because I want to be my own boss.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [232.02, 234.8],
"text": "I'm like, yeah, sure, but that's not a reason to get out of bed every day.",
"speaker": "SPEAKER_02"
},
{
"timestamp": [235.52, 238.46],
"text": "This got me thinking a lot about the guests that I have sit here",
"speaker": "SPEAKER_03"
},
{
"timestamp": [238.46, 241.12],
"text": "and also my own story where sometimes",
"speaker": "SPEAKER_03"
},
{
"timestamp": [241.65, 244.2],
"text": "I think people's why or the thing that's been driving",
"speaker": "SPEAKER_03"
},
{
"timestamp": [244.2, 246.75],
"text": "them is in fact some kind of trauma or",
"speaker": "SPEAKER_03"
},
{
"timestamp": [246.75, 249.69],
"text": "insecurity I think because you sit here with",
"speaker": "SPEAKER_03"
},
{
"timestamp": [249.69, 252.62],
"text": "people in there whether it's whether it's Israel",
"speaker": "SPEAKER_03"
},
{
"timestamp": [252.62, 255.96],
"text": "Adesanya the UFC champion who's the current maybe",
"speaker": "SPEAKER_03"
},
{
"timestamp": [255.96, 259.3],
"text": "world's best UFC fighter he was battered and",
"speaker": "SPEAKER_03"
},
{
"timestamp": [259.3, 262.03],
"text": "bullied as a kid being the only black kid in",
"speaker": "SPEAKER_03"
},
{
"timestamp": [262.03, 264.76],
"text": "his school in New Zealand and so it's no coincidence",
"speaker": "SPEAKER_03"
},
{
"timestamp": [264.76, 267.46],
"text": "that he strived to be this fighter and in fact when",
"speaker": "SPEAKER_03"
},
{
"timestamp": [267.46, 270.16],
"text": "he won the UFC title the next day he was",
"speaker": "SPEAKER_03"
},
{
"timestamp": [271.28, 272.16],
"text": "He went to therapy.",
"speaker": "SPEAKER_03"
},
{
"timestamp": [273.2, 277.46],
"text": "That's made me question whether our whys can sometimes be trauma",
"speaker": "SPEAKER_03"
},
{
"timestamp": [277.46, 280.68],
"text": "or insecurity driven as opposed to",
"speaker": "SPEAKER_03"
},
{
"timestamp": [280.68, 283.89],
"text": "being intentional and I don't know.",
"speaker": "SPEAKER_03"
}
]
}
},
"finishReason": "stop",
"usage": {
"inputTokens": 88957,
"outputTokens": 68648,
"totalTokens": 157605
}
}The output is truncated for this example.