Use this function for downloading a desired YouTube video caption in a tidy tibble data_frame form and save it as an Excel file in your current working directory.

get_caption(
  url = NULL,
  language = "en",
  savexl = FALSE,
  openxl = FALSE,
  path = getwd()
)

Arguments

url

A string value for a single YouTube video link URL. A typical form should start with "https://www.youtube.com/watch?v=" followed by a unique video ID.

language

two-character language code for the video URL. Set to "en" (English) by default. You can change this to fit with your needs (e.g., "ko" for Korean, "de" for German, etc.).

savexl

A logical value for determining whether or not to save the obtained tidy YouTube caption data as an Excel file. The default is FALSE which does not save it as a file. If set to TRUE, a file named "YouTube_caption_videoID.xlsx" is saved in your specified directory (the default is your current working directory).

openxl

A logical value for determining whether or not to open, if any, the saved YouTube_caption Excel file in your working directory. The default is FALSE. TRUE works only when the preceding argument (i.e., savexl) is set to TRUE.

path

A character vector of full path names; the default corresponds to the working directory, getwd. Tilde expansion (see path.expand) is performed. Missing values will be ignored.

Value

tibble (advanced data.frame) object for a YouTube video caption will be returned.

Details

get_caption

See example below.

References

https://pypi.org/project/youtube-transcript-api/

Examples

# \donttest{ library(youtubecaption) # Let's get the video caption out of Hadley Wickham's "You can't do data science in a GUI": url <- "https://www.youtube.com/watch?v=cpbtcsGE0OA" caption <- get_caption(url)
#> Warning: path[1]="C:\Anaconda3\envs\bookworm/python.exe": The system cannot find the file specified
#> Warning: path[1]="C:\Anaconda3\envs\bookworm/python.exe": The system cannot find the file specified
caption
#> # A tibble: 1,420 x 5 #> segment_id text start duration vid #> <int> <chr> <dbl> <dbl> <chr> #> 1 1 thank you for coming to a meeting today 7.13 8.32 cpbtcsGE0~ #> 2 2 in regards to data science GUI with 10.7 8.44 cpbtcsGE0~ #> 3 3 happy with chief data scientist in our 15.4 7.11 cpbtcsGE0~ #> 4 4 studio as well as the member of the our 19.1 7.23 cpbtcsGE0~ #> 5 5 Foundation and an attempt professor at 22.6 6 cpbtcsGE0~ #> 6 6 Stanford and at the University of 26.4 6.48 cpbtcsGE0~ #> 7 7 Auckland he builds both computational 28.6 7.17 cpbtcsGE0~ #> 8 8 and cognitive tools to make data science 32.8 7.5 cpbtcsGE0~ #> 9 9 easier faster and more times his work 35.7 7.01 cpbtcsGE0~ #> 10 10 includes various packages as well as 40.4 6.21 cpbtcsGE0~ #> # ... with 1,410 more rows
# Save the caption as an Excel file and open it right it away ## Changing path to temp for the demonstration purpose only: get_caption(url = url, savexl = TRUE, openxl = TRUE, path = tempdir())
#> Warning: path[1]="C:\Anaconda3\envs\bookworm/python.exe": The system cannot find the file specified
#> Warning: path[1]="C:\Anaconda3\envs\bookworm/python.exe": The system cannot find the file specified
#> # A tibble: 1,420 x 5 #> segment_id text start duration vid #> <int> <chr> <dbl> <dbl> <chr> #> 1 1 thank you for coming to a meeting today 7.13 8.32 cpbtcsGE0~ #> 2 2 in regards to data science GUI with 10.7 8.44 cpbtcsGE0~ #> 3 3 happy with chief data scientist in our 15.4 7.11 cpbtcsGE0~ #> 4 4 studio as well as the member of the our 19.1 7.23 cpbtcsGE0~ #> 5 5 Foundation and an attempt professor at 22.6 6 cpbtcsGE0~ #> 6 6 Stanford and at the University of 26.4 6.48 cpbtcsGE0~ #> 7 7 Auckland he builds both computational 28.6 7.17 cpbtcsGE0~ #> 8 8 and cognitive tools to make data science 32.8 7.5 cpbtcsGE0~ #> 9 9 easier faster and more times his work 35.7 7.01 cpbtcsGE0~ #> 10 10 includes various packages as well as 40.4 6.21 cpbtcsGE0~ #> # ... with 1,410 more rows
# }