Title: | Conversation Similarity Analysis Package |
---|---|
Description: | A comprehensive toolkit for analyzing and comparing conversations. This package provides functions to calculate various similarity measures between conversations, including topic, lexical, semantic, structural, stylistic, sentiment, participant, and timing similarities. It supports both pairwise conversation comparisons and analysis of multiple dyads. |
Authors: | Chao Liu [aut, cre, cph] |
Maintainer: | Chao Liu <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0 |
Built: | 2024-11-20 05:22:13 UTC |
Source: | https://github.com/chaoliu-cl/conversim |
Aggregate similarity sequence for a single dyad
agg_seq(sequence, num_segments)
agg_seq(sequence, num_segments)
sequence |
A numeric vector of similarity scores for a single dyad |
num_segments |
The number of segments to aggregate into |
This function aggregates a similarity sequence into a specified number of segments for a single dyad.
A numeric vector of aggregated similarity scores
seq <- c(0.5, 0.6, 0.7, 0.6, 0.8, 0.7, 0.9, 0.8, 0.7, 0.8) # Aggregate the sequence into 3 segments agg_3 <- agg_seq(seq, 3) print(agg_3) # Aggregate the sequence into 5 segments agg_5 <- agg_seq(seq, 5) print(agg_5)
seq <- c(0.5, 0.6, 0.7, 0.6, 0.8, 0.7, 0.9, 0.8, 0.7, 0.8) # Aggregate the sequence into 3 segments agg_3 <- agg_seq(seq, 3) print(agg_3) # Aggregate the sequence into 5 segments agg_5 <- agg_seq(seq, 5) print(agg_5)
This function calculates the correlation between different similarity measures.
calc_sim_cor(comparison_df)
calc_sim_cor(comparison_df)
comparison_df |
A data frame output from compare_sim_meas() |
A correlation matrix
topic_similarities <- list("1" = c(0.5, 0.6, 0.7), "2" = c(0.4, 0.5, 0.6)) lexical_similarities <- list("1" = c(0.6, 0.7, 0.8), "2" = c(0.5, 0.6, 0.7)) comparison_df <- compare_sim_meas( list(topic_similarities, lexical_similarities), c("Topic", "Lexical") ) calc_sim_cor(comparison_df)
topic_similarities <- list("1" = c(0.5, 0.6, 0.7), "2" = c(0.4, 0.5, 0.6)) lexical_similarities <- list("1" = c(0.6, 0.7, 0.8), "2" = c(0.5, 0.6, 0.7)) comparison_df <- compare_sim_meas( list(topic_similarities, lexical_similarities), c("Topic", "Lexical") ) calc_sim_cor(comparison_df)
This function calculates a sequence of similarities between consecutive windows in a conversation.
calc_sim_seq(conversation, window_size, similarity_func)
calc_sim_seq(conversation, window_size, similarity_func)
conversation |
A dataframe containing the conversation, with a column named 'processed_text'. |
window_size |
An integer specifying the size of each window. |
similarity_func |
A function that calculates similarity between two text strings. |
A list containing two elements:
sequence |
A numeric vector of similarity scores between consecutive windows |
average |
The mean of the similarity scores |
conversation <- data.frame(processed_text = c("hello", "world", "how", "are", "you")) result <- calc_sim_seq(conversation, 2, function(x, y) sum(x == y) / max(length(x), length(y)))
conversation <- data.frame(processed_text = c("hello", "world", "how", "are", "you")) result <- calc_sim_seq(conversation, 2, function(x, y) sum(x == y) / max(length(x), length(y)))
This function calculates summary statistics for the similarities of multiple dyads.
calc_sum_stats(similarities)
calc_sum_stats(similarities)
similarities |
A list of similarity sequences for each dyad |
A matrix with summary statistics for each dyad
similarities <- list( "1" = c(0.5, 0.6, 0.7), "2" = c(0.4, 0.5, 0.6) ) calc_sum_stats(similarities)
similarities <- list( "1" = c(0.5, 0.6, 0.7), "2" = c(0.4, 0.5, 0.6) ) calc_sum_stats(similarities)
Combine similarity measures for a single dyad
combine_sim_seq(similarities, weights = NULL)
combine_sim_seq(similarities, weights = NULL)
similarities |
A list of similarity measures for a single dyad |
weights |
A numeric vector of weights for each similarity measure (default is equal weights) |
This function combines multiple similarity measures into a single overall similarity score for a single dyad.
A list containing the combined sequence and average similarity
sim1 <- list(sequence = c(0.8, 0.7, 0.9), average = 0.8) sim2 <- list(sequence = c(0.6, 0.8, 0.7), average = 0.7) combine_sim_seq(list(sim1, sim2))
sim1 <- list(sequence = c(0.8, 0.7, 0.9), average = 0.8) sim2 <- list(sequence = c(0.6, 0.8, 0.7), average = 0.7) combine_sim_seq(list(sim1, sim2))
This file contains utility functions and visualization tools to complement the main similarity calculation functions for comparing two speeches. Combine multiple similarity measures
combine_sims(similarities, weights = NULL)
combine_sims(similarities, weights = NULL)
similarities |
A named list of similarity scores |
weights |
A named list of weights for each similarity measure (optional) |
This function combines multiple similarity measures into a single score.
A single combined similarity score
sims <- list(topic = 0.8, lexical = 0.6, semantic = 0.7, structural = 0.9) combine_sims(sims) combine_sims(sims, weights = list(topic = 2, lexical = 1, semantic = 1.5, structural = 1))
sims <- list(topic = 0.8, lexical = 0.6, semantic = 0.7, structural = 0.9) combine_sims(sims) combine_sims(sims, weights = list(topic = 2, lexical = 1, semantic = 1.5, structural = 1))
This function compares multiple similarity measures for the same set of dyads.
compare_sim_meas(similarity_list, measure_names)
compare_sim_meas(similarity_list, measure_names)
similarity_list |
A list of lists, where each inner list contains similarities for each dyad |
measure_names |
A vector of names for each similarity measure |
A data frame with all similarity measures for each dyad
topic_similarities <- list("1" = c(0.5, 0.6, 0.7), "2" = c(0.4, 0.5, 0.6)) lexical_similarities <- list("1" = c(0.6, 0.7, 0.8), "2" = c(0.5, 0.6, 0.7)) compare_sim_meas( list(topic_similarities, lexical_similarities), c("Topic", "Lexical") )
topic_similarities <- list("1" = c(0.5, 0.6, 0.7), "2" = c(0.4, 0.5, 0.6)) lexical_similarities <- list("1" = c(0.6, 0.7, 0.8), "2" = c(0.5, 0.6, 0.7)) compare_sim_meas( list(topic_similarities, lexical_similarities), c("Topic", "Lexical") )
This function visualizes the comparison of stylistic features between two speeches.
compare_style(stylistic_result)
compare_style(stylistic_result)
stylistic_result |
The result from stylistic_similarity function |
A ggplot object
text1 <- "The quick brown fox jumps over the lazy dog. It's a sunny day." text2 <- "A lazy cat sleeps on the warm windowsill. Birds chirp outside." result <- stylistic_similarity(text1, text2) compare_style(result)
text1 <- "The quick brown fox jumps over the lazy dog. It's a sunny day." text2 <- "A lazy cat sleeps on the warm windowsill. Birds chirp outside." result <- stylistic_similarity(text1, text2) compare_style(result)
Calculate Correlation Between Similarity Measures for a Single Dyad
cor_sim_seq(similarities, method = "pearson")
cor_sim_seq(similarities, method = "pearson")
similarities |
A list of similarity measures for a single dyad |
method |
The correlation method to use (default is "pearson") |
This function calculates the correlation between different similarity measures for a single dyad.
A correlation matrix
sim1 <- list(sequence = c(0.8, 0.7, 0.9), average = 0.8) sim2 <- list(sequence = c(0.6, 0.8, 0.7), average = 0.7) cor_sim_seq(list(sim1, sim2))
sim1 <- list(sequence = c(0.8, 0.7, 0.9), average = 0.8) sim2 <- list(sequence = c(0.6, 0.8, 0.7), average = 0.7) cor_sim_seq(list(sim1, sim2))
This function creates a list of windows from a conversation dataframe.
create_windows(conversation, window_size)
create_windows(conversation, window_size)
conversation |
A dataframe containing the conversation, with a column named 'processed_text'. |
window_size |
An integer specifying the size of each window. |
A list of character vectors, where each vector represents a window of text.
conversation <- data.frame(processed_text = c("hello", "world", "how", "are", "you")) windows <- create_windows(conversation, 3)
conversation <- data.frame(processed_text = c("hello", "world", "how", "are", "you")) windows <- create_windows(conversation, 3)
This function generates a comprehensive report of all similarity measures.
gen_sim_report( speech1, speech2, topic_method = "lda", semantic_method = "tfidf", glove_path = NULL )
gen_sim_report( speech1, speech2, topic_method = "lda", semantic_method = "tfidf", glove_path = NULL )
speech1 |
A character string representing the first speech |
speech2 |
A character string representing the second speech |
topic_method |
Method for topic similarity calculation ("lda" or "lsa") |
semantic_method |
Method for semantic similarity calculation ("tfidf", "word2vec", or "glove") |
glove_path |
Path to pre-trained GloVe file (if using "glove" method) |
A list containing all similarity measures and visualizations
speech1 <- "This is the first speech. It talks about important topics." speech2 <- "This is the second speech. It covers similar subjects." report <- gen_sim_report(speech1, speech2)
speech1 <- "This is the first speech. It talks about important topics." speech2 <- "This is the second speech. It covers similar subjects." report <- gen_sim_report(speech1, speech2)
Create a heatmap of similarity measures for a single dyad
heatmap_sim(similarities, titles)
heatmap_sim(similarities, titles)
similarities |
A list of similarity measures for a single dyad |
titles |
A character vector of titles for each similarity measure |
This function creates a heatmap of multiple similarity measures for a single dyad.
A ggplot object
sim1 <- list(sequence = c(0.5, 0.6, 0.7, 0.6, 0.8), average = 0.64) sim2 <- list(sequence = c(0.4, 0.5, 0.6, 0.7, 0.7), average = 0.58) similarities <- list(sim1, sim2) titles <- c("Measure 1", "Measure 2") # Plot multiple similarity measures plot <- plot_sim_multi(similarities, titles) print(plot)
sim1 <- list(sequence = c(0.5, 0.6, 0.7, 0.6, 0.8), average = 0.64) sim2 <- list(sequence = c(0.4, 0.5, 0.6, 0.7, 0.7), average = 0.58) similarities <- list(sim1, sim2) titles <- c("Measure 1", "Measure 2") # Plot multiple similarity measures plot <- plot_sim_multi(similarities, titles) print(plot)
This function calculates lexical similarity over a sequence of conversation exchanges within a single dyad.
lex_sim_seq(conversation, window_size = 3)
lex_sim_seq(conversation, window_size = 3)
conversation |
A data frame representing the conversation |
window_size |
An integer specifying the size of the sliding window |
A list containing the sequence of similarities and the average similarity
conversation <- data.frame( processed_text = c("Hello world", "World of programming", "Programming is fun", "Fun world of coding") ) result <- lex_sim_seq(conversation, window_size = 2) print(result)
conversation <- data.frame( processed_text = c("Hello world", "World of programming", "Programming is fun", "Fun world of coding") ) result <- lex_sim_seq(conversation, window_size = 2) print(result)
This function calculates lexical similarity over a sequence of conversation exchanges for multiple dyads.
lexical_sim_dyads(conversations, window_size = 3)
lexical_sim_dyads(conversations, window_size = 3)
conversations |
A data frame with columns 'dyad_id', 'speaker', and 'processed_text' |
window_size |
An integer specifying the size of the sliding window |
A list containing the sequence of similarities for each dyad and the overall average similarity
library(lme4) convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) lexical_sim_dyads(convs, window_size = 2)
library(lme4) convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) lexical_sim_dyads(convs, window_size = 2)
This function calculates the lexical similarity between two conversations based on the overlap of unique words.
lexical_similarity(conv1, conv2)
lexical_similarity(conv1, conv2)
conv1 |
A character string representing the first conversation |
conv2 |
A character string representing the second conversation |
A numeric value representing the lexical similarity
conv1 <- "The quick brown fox jumps over the lazy dog" conv2 <- "The lazy dog sleeps under the quick brown fox" lexical_similarity(conv1, conv2)
conv1 <- "The quick brown fox jumps over the lazy dog" conv2 <- "The lazy dog sleeps under the quick brown fox" lexical_similarity(conv1, conv2)
Normalize similarity scores
norm_sim(similarities)
norm_sim(similarities)
similarities |
A numeric vector of similarity scores |
This function normalizes similarity scores to a 0-1 range.
A numeric vector of normalized similarity scores
similarities <- c(0.2, 0.5, 0.8, 1.0, 0.3) norm_sim(similarities)
similarities <- c(0.2, 0.5, 0.8, 1.0, 0.3) norm_sim(similarities)
This function calculates an extended measure of participant similarity for multiple dyads.
participant_sim_dyads(conversations)
participant_sim_dyads(conversations)
conversations |
A data frame with columns 'dyad_id', 'speaker', and 'processed_text' |
A list containing participant similarity for each dyad and the overall average similarity
convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) participant_sim_dyads(convs)
convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) participant_sim_dyads(convs)
Plot Correlation Heatmap for a Single Dyad
plot_cor_heatmap(cor_matrix, titles)
plot_cor_heatmap(cor_matrix, titles)
cor_matrix |
A correlation matrix for a single dyad |
titles |
A character vector of titles for each similarity measure |
This function creates a heatmap of correlations between similarity measures for a single dyad.
A ggplot object
sim1 <- list(sequence = c(0.8, 0.7, 0.9), average = 0.8) sim2 <- list(sequence = c(0.6, 0.8, 0.7), average = 0.7) cor_matrix <- cor_sim_seq(list(sim1, sim2)) plot_cor_heatmap(cor_matrix, c("Topic", "Lexical"))
sim1 <- list(sequence = c(0.8, 0.7, 0.9), average = 0.8) sim2 <- list(sequence = c(0.6, 0.8, 0.7), average = 0.7) cor_matrix <- cor_sim_seq(list(sim1, sim2)) plot_cor_heatmap(cor_matrix, c("Topic", "Lexical"))
This function creates a ggplot object comparing multiple similarity measures for the same set of dyads.
plot_sim_comp(comparison_df, title)
plot_sim_comp(comparison_df, title)
comparison_df |
A data frame output from compare_sim_meas() |
title |
A string specifying the plot title |
A ggplot object
topic_similarities <- list("1" = c(0.5, 0.6, 0.7), "2" = c(0.4, 0.5, 0.6)) lexical_similarities <- list("1" = c(0.6, 0.7, 0.8), "2" = c(0.5, 0.6, 0.7)) comparison_df <- compare_sim_meas( list(topic_similarities, lexical_similarities), c("Topic", "Lexical") ) plot_sim_comp(comparison_df, "Comparison of Similarity Measures")
topic_similarities <- list("1" = c(0.5, 0.6, 0.7), "2" = c(0.4, 0.5, 0.6)) lexical_similarities <- list("1" = c(0.6, 0.7, 0.8), "2" = c(0.5, 0.6, 0.7)) comparison_df <- compare_sim_meas( list(topic_similarities, lexical_similarities), c("Topic", "Lexical") ) plot_sim_comp(comparison_df, "Comparison of Similarity Measures")
This function creates a ggplot object showing a heatmap of correlations between similarity measures.
plot_sim_cor_heatmap(cor_matrix, title)
plot_sim_cor_heatmap(cor_matrix, title)
cor_matrix |
A correlation matrix output from calc_sim_cor() |
title |
A string specifying the plot title |
A ggplot object
topic_similarities <- list("1" = c(0.5, 0.6, 0.7), "2" = c(0.4, 0.5, 0.6)) lexical_similarities <- list("1" = c(0.6, 0.7, 0.8), "2" = c(0.5, 0.6, 0.7)) comparison_df <- compare_sim_meas( list(topic_similarities, lexical_similarities), c("Topic", "Lexical") ) cor_matrix <- calc_sim_cor(comparison_df) plot_sim_cor_heatmap(cor_matrix, "Correlation of Similarity Measures")
topic_similarities <- list("1" = c(0.5, 0.6, 0.7), "2" = c(0.4, 0.5, 0.6)) lexical_similarities <- list("1" = c(0.6, 0.7, 0.8), "2" = c(0.5, 0.6, 0.7)) comparison_df <- compare_sim_meas( list(topic_similarities, lexical_similarities), c("Topic", "Lexical") ) cor_matrix <- calc_sim_cor(comparison_df) plot_sim_cor_heatmap(cor_matrix, "Correlation of Similarity Measures")
Plot multiple similarity measures for a single dyad
plot_sim_multi(similarities, titles)
plot_sim_multi(similarities, titles)
similarities |
A list of similarity measures for a single dyad |
titles |
A character vector of titles for each similarity measure |
This function creates a faceted plot of multiple similarity measures for a single dyad.
A ggplot object
sim1 <- list(sequence = c(0.5, 0.6, 0.7, 0.6, 0.8), average = 0.64) sim2 <- list(sequence = c(0.4, 0.5, 0.6, 0.7, 0.7), average = 0.58) similarities <- list(sim1, sim2) titles <- c("Measure 1", "Measure 2") # Plot multiple similarity measures plot <- plot_sim_multi(similarities, titles) print(plot)
sim1 <- list(sequence = c(0.5, 0.6, 0.7, 0.6, 0.8), average = 0.64) sim2 <- list(sequence = c(0.4, 0.5, 0.6, 0.7, 0.7), average = 0.58) similarities <- list(sim1, sim2) titles <- c("Measure 1", "Measure 2") # Plot multiple similarity measures plot <- plot_sim_multi(similarities, titles) print(plot)
Plot similarity sequence for a single dyad
plot_sim_seq(similarity, title)
plot_sim_seq(similarity, title)
similarity |
A list containing the sequence of similarities and the average similarity |
title |
A character string for the plot title |
This function creates a line plot of the similarity sequence for a single dyad.
A ggplot object
sim_list <- list( sequence = c(0.5, 0.6, 0.7, 0.6, 0.8), average = 0.64 ) # Plot the similarity sequence plot <- plot_sim_seq(sim_list, "Dyad Similarity Sequence") print(plot)
sim_list <- list( sequence = c(0.5, 0.6, 0.7, 0.6, 0.8), average = 0.64 ) # Plot the similarity sequence plot <- plot_sim_seq(sim_list, "Dyad Similarity Sequence") print(plot)
This function creates a ggplot object showing the similarity over time for multiple dyads.
plot_sim_time(similarities, title, y_label)
plot_sim_time(similarities, title, y_label)
similarities |
A list of similarity sequences for each dyad |
title |
A string specifying the plot title |
y_label |
A string specifying the y-axis label |
A ggplot object
similarities <- list( "1" = c(0.5, 0.6, 0.7), "2" = c(0.4, 0.5, 0.6) ) plot_sim_time(similarities, "Topic Similarity", "Similarity Score")
similarities <- list( "1" = c(0.5, 0.6, 0.7), "2" = c(0.4, 0.5, 0.6) ) plot_sim_time(similarities, "Topic Similarity", "Similarity Score")
This function creates a bar plot of similarity scores.
plot_sims(similarities)
plot_sims(similarities)
similarities |
A named list of similarity scores |
A ggplot object
sims <- list(topic = 0.8, lexical = 0.6, semantic = 0.7, structural = 0.9) plot_sims(sims)
sims <- list(topic = 0.8, lexical = 0.6, semantic = 0.7, structural = 0.9) plot_sims(sims)
This function creates a ggplot object showing summary statistics for similarities of multiple dyads.
plot_sum_stats(summary_stats, title)
plot_sum_stats(summary_stats, title)
summary_stats |
A data frame with summary statistics for each dyad |
title |
A string specifying the plot title |
A ggplot object
similarities <- list( "1" = c(0.5, 0.6, 0.7), "2" = c(0.4, 0.5, 0.6) ) stats <- calc_sum_stats(similarities) plot_sum_stats(stats, "Summary Statistics of Similarities")
similarities <- list( "1" = c(0.5, 0.6, 0.7), "2" = c(0.4, 0.5, 0.6) ) stats <- calc_sum_stats(similarities) plot_sum_stats(stats, "Summary Statistics of Similarities")
This function preprocesses conversations from multiple dyads by applying text cleaning to each utterance.
preprocess_dyads(conversations)
preprocess_dyads(conversations)
conversations |
A data frame with columns 'dyad_id', 'speaker', and 'text' |
A data frame with an additional 'processed_text' column, removing any rows with empty processed text
convs <- data.frame( dyad_id = c(1, 1, 2, 2), speaker = c("A", "B", "C", "D"), text = c("Hello!", "Hi there!", "How are you?", "I'm fine, thanks!") ) preprocess_dyads(convs)
convs <- data.frame( dyad_id = c(1, 1, 2, 2), speaker = c("A", "B", "C", "D"), text = c("Hello!", "Hi there!", "How are you?", "I'm fine, thanks!") ) preprocess_dyads(convs)
Preprocess text for analysis
preprocess_text(text)
preprocess_text(text)
text |
A character string to be preprocessed |
This function preprocesses the input text by converting to lowercase, removing punctuation and digits, and trimming whitespace.
A preprocessed character string
text <- "Hello, World! This is an example text (with 123 numbers)." preprocess_text(text)
text <- "Hello, World! This is an example text (with 123 numbers)." preprocess_text(text)
This function prints a formatted summary of the similarity report.
print_sim_report(report)
print_sim_report(report)
report |
A similarity report generated by gen_sim_report function |
NULL (invisibly). This function is called for its side effect of printing to the console.
speech1 <- "This is the first speech. It talks about important topics." speech2 <- "This is the second speech. It covers similar subjects." report <- gen_sim_report(speech1, speech2) print_sim_report(report)
speech1 <- "This is the first speech. It talks about important topics." speech2 <- "This is the second speech. It covers similar subjects." report <- gen_sim_report(speech1, speech2) print_sim_report(report)
Create a radar chart of average similarities for a single dyad
radar_sim(similarities, titles)
radar_sim(similarities, titles)
similarities |
A list of similarity measures for a single dyad |
titles |
A character vector of titles for each similarity measure |
This function creates a radar chart of average similarities for multiple measures of a single dyad.
A ggplot object
sim1 <- list(sequence = c(0.5, 0.6, 0.7, 0.6, 0.8), average = 0.64) sim2 <- list(sequence = c(0.4, 0.5, 0.6, 0.7, 0.7), average = 0.58) sim3 <- list(sequence = c(0.6, 0.7, 0.8, 0.7, 0.9), average = 0.74) sim4 <- list(sequence = c(0.3, 0.4, 0.5, 0.6, 0.6), average = 0.48) similarities <- list(sim1, sim2, sim3, sim4) titles <- c("Measure 1", "Measure 2", "Measure 3", "Measure 4") # Create radar chart radar <- radar_sim(similarities, titles) print(radar)
sim1 <- list(sequence = c(0.5, 0.6, 0.7, 0.6, 0.8), average = 0.64) sim2 <- list(sequence = c(0.4, 0.5, 0.6, 0.7, 0.7), average = 0.58) sim3 <- list(sequence = c(0.6, 0.7, 0.8, 0.7, 0.9), average = 0.74) sim4 <- list(sequence = c(0.3, 0.4, 0.5, 0.6, 0.6), average = 0.48) similarities <- list(sim1, sim2, sim3, sim4) titles <- c("Measure 1", "Measure 2", "Measure 3", "Measure 4") # Create radar chart radar <- radar_sim(similarities, titles) print(radar)
Run package examples
run_example(example_name)
run_example(example_name)
example_name |
Name of the example file to run |
No return value, called for side effects.
## Not run: run_example("sequence_multidyads_examples.R") run_example("main_functions_examples.R") ## End(Not run)
## Not run: run_example("sequence_multidyads_examples.R") run_example("main_functions_examples.R") ## End(Not run)
This function calculates semantic similarity over a sequence of conversation exchanges within a single dyad.
sem_sim_seq(conversation, method = "tfidf", window_size = 3, ...)
sem_sim_seq(conversation, method = "tfidf", window_size = 3, ...)
conversation |
A data frame representing the conversation |
method |
A character string specifying the method to use: "tfidf", "word2vec", or "glove" |
window_size |
An integer specifying the size of the sliding window |
... |
Additional arguments passed to semantic_similarity |
A list containing the sequence of similarities and the average similarity
conversation <- data.frame( processed_text = c("The weather is nice", "It's a beautiful day", "The sun is shining", "Perfect day for a picnic") ) result <- sem_sim_seq(conversation, method = "tfidf", window_size = 2) print(result)
conversation <- data.frame( processed_text = c("The weather is nice", "It's a beautiful day", "The sun is shining", "Perfect day for a picnic") ) result <- sem_sim_seq(conversation, method = "tfidf", window_size = 2) print(result)
This function calculates semantic similarity over a sequence of conversation exchanges for multiple dyads.
semantic_sim_dyads(conversations, method = "tfidf", window_size = 3, ...)
semantic_sim_dyads(conversations, method = "tfidf", window_size = 3, ...)
conversations |
A data frame with columns 'dyad_id', 'speaker', and 'processed_text' |
method |
A character string specifying the method to use: "tfidf", "word2vec", or "glove" |
window_size |
An integer specifying the size of the sliding window |
... |
Additional arguments passed to semantic_similarity |
A list containing the sequence of similarities for each dyad and the overall average similarity
library(lme4) convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) semantic_sim_dyads(convs, method = "tfidf", window_size = 2)
library(lme4) convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) semantic_sim_dyads(convs, method = "tfidf", window_size = 2)
This function calculates the semantic similarity between two conversations using either TF-IDF, Word2Vec, or GloVe embeddings approach.
semantic_similarity( conversation1, conversation2, method = "tfidf", model_path = NULL, dim = 100, window = 5, iter = 5 )
semantic_similarity( conversation1, conversation2, method = "tfidf", model_path = NULL, dim = 100, window = 5, iter = 5 )
conversation1 |
A character string representing the first conversation |
conversation2 |
A character string representing the second conversation |
method |
A character string specifying the method to use: "tfidf", "word2vec", or "glove" |
model_path |
A character string specifying the path to pre-trained GloVe file (required for "glove" method) |
dim |
An integer specifying the dimensionality for Word2Vec embeddings (default: 100) |
window |
An integer specifying the window size for Word2Vec (default: 5) |
iter |
An integer specifying the number of iterations for Word2Vec (default: 5) |
A numeric value representing the semantic similarity (between 0 and 1)
conv1 <- "The quick brown fox jumps over the lazy dog" conv2 <- "A fast auburn canine leaps above an idle hound" semantic_similarity(conv1, conv2, method = "tfidf")
conv1 <- "The quick brown fox jumps over the lazy dog" conv2 <- "A fast auburn canine leaps above an idle hound" semantic_similarity(conv1, conv2, method = "tfidf")
This function calculates sentiment similarity over a sequence of conversation exchanges within a single dyad.
sent_sim_seq(conversation, window_size = 3)
sent_sim_seq(conversation, window_size = 3)
conversation |
A data frame representing the conversation |
window_size |
An integer specifying the size of the sliding window |
A list containing the sequence of similarities and the average similarity
conversation <- data.frame( processed_text = c("I love this movie!", "It's really amazing.", "The acting is superb.", "I couldn't agree more.") ) result <- sent_sim_seq(conversation, window_size = 2) print(result)
conversation <- data.frame( processed_text = c("I love this movie!", "It's really amazing.", "The acting is superb.", "I couldn't agree more.") ) result <- sent_sim_seq(conversation, window_size = 2) print(result)
This function calculates sentiment similarity over a sequence of conversation exchanges for multiple dyads.
sentiment_sim_dyads(conversations, window_size = 3)
sentiment_sim_dyads(conversations, window_size = 3)
conversations |
A data frame with columns 'dyad_id', 'speaker', and 'processed_text' |
window_size |
An integer specifying the size of the sliding window |
A list containing the sequence of similarities for each dyad and the overall average similarity
library(lme4) convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) sentiment_sim_dyads(convs, window_size = 2)
library(lme4) convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) sentiment_sim_dyads(convs, window_size = 2)
This function calculates the sentiment similarity between two conversations using the sentimentr package.
sentiment_similarity(conv1, conv2)
sentiment_similarity(conv1, conv2)
conv1 |
A character string representing the first conversation |
conv2 |
A character string representing the second conversation |
A numeric value representing the sentiment similarity
conv1 <- "I love this product! It's amazing and works great." conv2 <- "This item is okay. It does the job but could be better." sentiment_similarity(conv1, conv2)
conv1 <- "I love this product! It's amazing and works great." conv2 <- "This item is okay. It does the job but could be better." sentiment_similarity(conv1, conv2)
This function calculates an extended measure of structural similarity for multiple dyads.
structural_sim_dyads(conversations)
structural_sim_dyads(conversations)
conversations |
A data frame with columns 'dyad_id', 'speaker', and 'processed_text' |
A list containing structural similarity for each dyad and the overall average similarity
convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) structural_sim_dyads(convs)
convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) structural_sim_dyads(convs)
This function calculates the structural similarity between two conversations based on their length and average turn length.
structural_similarity(conv1, conv2)
structural_similarity(conv1, conv2)
conv1 |
A character vector representing the first conversation |
conv2 |
A character vector representing the second conversation |
A numeric value representing the structural similarity
conv1 <- c("Hello", "Hi there", "How are you?", "I'm fine, thanks") conv2 <- c("Good morning", "Hello", "Nice day, isn't it?", "Yes, indeed") structural_similarity(conv1, conv2)
conv1 <- c("Hello", "Hi there", "How are you?", "I'm fine, thanks") conv2 <- c("Good morning", "Hello", "Nice day, isn't it?", "Yes, indeed") structural_similarity(conv1, conv2)
This function calculates stylistic similarity over a sequence of conversation exchanges within a single dyad.
style_sim_seq(conversation, window_size = 3)
style_sim_seq(conversation, window_size = 3)
conversation |
A data frame representing the conversation |
window_size |
An integer specifying the size of the sliding window |
A list containing the sequence of similarities and the average similarity
conversation <- data.frame( processed_text = c("How are you doing?", "I'm doing great, thanks!", "That's wonderful to hear.", "I'm glad you're doing well.") ) result <- style_sim_seq(conversation, window_size = 2) print(result)
conversation <- data.frame( processed_text = c("How are you doing?", "I'm doing great, thanks!", "That's wonderful to hear.", "I'm glad you're doing well.") ) result <- style_sim_seq(conversation, window_size = 2) print(result)
This function calculates stylistic similarity over a sequence of conversation exchanges for multiple dyads.
stylistic_sim_dyads(conversations, window_size = 3)
stylistic_sim_dyads(conversations, window_size = 3)
conversations |
A data frame with columns 'dyad_id', 'speaker', and 'processed_text' |
window_size |
An integer specifying the size of the sliding window |
A list containing the sequence of similarities for each dyad and the overall average similarity
convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) stylistic_sim_dyads(convs, window_size = 2)
convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) stylistic_sim_dyads(convs, window_size = 2)
This function calculates various stylistic features and their similarity between two conversations.
stylistic_similarity(text1, text2)
stylistic_similarity(text1, text2)
text1 |
A character string representing the first conversation |
text2 |
A character string representing the second conversation |
A list containing stylistic features and similarity measures
text1 <- "The quick brown fox jumps over the lazy dog. It's a sunny day." text2 <- "A lazy cat sleeps on the warm windowsill. Birds chirp outside." stylistic_similarity(text1, text2)
text1 <- "The quick brown fox jumps over the lazy dog. It's a sunny day." text2 <- "A lazy cat sleeps on the warm windowsill. Birds chirp outside." stylistic_similarity(text1, text2)
This function calculates an extended measure of timing similarity for multiple dyads.
timing_sim_dyads(conversations)
timing_sim_dyads(conversations)
conversations |
A data frame with columns 'dyad_id', 'speaker', and 'processed_text' |
A list containing timing similarity for each dyad and the overall average similarity
convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) timing_sim_dyads(convs)
convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) timing_sim_dyads(convs)
This function calculates topic similarity over a sequence of conversation exchanges for multiple dyads. It uses the Latent Dirichlet Allocation (LDA) method for topic modeling and the "slam" package for efficient handling of sparse matrices.
topic_sim_dyads(conversations, method = "lda", num_topics = 2, window_size = 3)
topic_sim_dyads(conversations, method = "lda", num_topics = 2, window_size = 3)
conversations |
A data frame with columns 'dyad_id', 'speaker', and 'processed_text' |
method |
A character string specifying the method to use: currently only "lda" is supported |
num_topics |
An integer specifying the number of topics to use in the LDA model |
window_size |
An integer specifying the size of the sliding window |
A list containing the sequence of similarities for each dyad and the overall average similarity
convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) topic_sim_dyads(convs, method = "lda", num_topics = 2, window_size = 2)
convs <- data.frame( dyad_id = c(1, 1, 1, 1, 2, 2, 2, 2), speaker = c("A", "B", "A", "B", "C", "D", "C", "D"), processed_text = c("i love pizza", "me too favorite food", "whats your favorite topping", "enjoy pepperoni mushrooms", "i prefer pasta", "pasta delicious like spaghetti carbonara", "ever tried making home", "yes quite easy make") ) topic_sim_dyads(convs, method = "lda", num_topics = 2, window_size = 2)
This function calculates topic similarity over a sequence of conversation exchanges within a single dyad.
topic_sim_seq(conversation, method = "lda", num_topics = 2, window_size = 3)
topic_sim_seq(conversation, method = "lda", num_topics = 2, window_size = 3)
conversation |
A data frame representing the conversation |
method |
A character string specifying the method to use: "lda" or "lsa" |
num_topics |
An integer specifying the number of topics to use in the model |
window_size |
An integer specifying the size of the sliding window |
A list containing the sequence of similarities and the average similarity
conversation <- data.frame( processed_text = c("The cat sat on the mat", "The dog chased the cat", "The mat was comfortable", "The cat liked the mat") ) result <- topic_sim_seq(conversation, method = "lda", num_topics = 2, window_size = 2) print(result)
conversation <- data.frame( processed_text = c("The cat sat on the mat", "The dog chased the cat", "The mat was comfortable", "The cat liked the mat") ) result <- topic_sim_seq(conversation, method = "lda", num_topics = 2, window_size = 2) print(result)
This function calculates the topic similarity between two conversations using either Latent Dirichlet Allocation (LDA) or Latent Semantic Analysis (LSA).
topic_similarity(conv1, conv2, method = "lda", num_topics = 2)
topic_similarity(conv1, conv2, method = "lda", num_topics = 2)
conv1 |
A character vector representing the first conversation |
conv2 |
A character vector representing the second conversation |
method |
A character string specifying the method to use: "lda" or "lsa" |
num_topics |
An integer specifying the number of topics to use in the model |
A numeric value representing the topic similarity
conv1 <- c("I love pizza", "Pizza is my favorite food") conv2 <- c("I prefer pasta", "Pasta is delicious") topic_similarity(conv1, conv2, method = "lda", num_topics = 2) topic_similarity(conv1, conv2, method = "lsa", num_topics = 2)
conv1 <- c("I love pizza", "Pizza is my favorite food") conv2 <- c("I prefer pasta", "Pasta is delicious") topic_similarity(conv1, conv2, method = "lda", num_topics = 2) topic_similarity(conv1, conv2, method = "lsa", num_topics = 2)