Please use this identifier to cite or link to this item: http://hdl.handle.net/10174/29056

Title: Evaluating the performance and improving the usability of parallel and distributed Word Embeddings tools
Authors: Silva, Mateus
Meyer, Vinicius
Kirchoff, Dionatrã
Neto, Joaquim
Vieira, Renata
De Rose, Cesar
Keywords: Language models
Issue Date: Mar-2020
Publisher: IEEE
Citation: M. L. d. Silva, V. Meyer, D. F. Kirchoff, F. S. Joaquim Neto, R. Vieira and A. F. César De Rose, "Evaluating the performance and improving the usability of parallel and distributed Word Embeddings tools," 2020 28th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), Västerås, Sweden, 2020, pp. 201-206, doi: 10.1109/PDP50117.2020.00038.
Abstract: The representation of words by means of vectors, also called Word Embeddings (WE), has been receiving great attention from the Natural Language Processing (NLP) field. WE models are able to express syntactic and semantic similarities, as well as relationships and contexts of words within a given corpus. Although the most popular implementations of WE algorithms present low scalability, there are new approaches that apply High-Performance Computing (HPC) techniques. This is an opportunity for an analysis of the main differences among the existing implementations, based on performance and scalability metrics. In this paper, we present a study which addresses resource utilization and performance aspects of known WE algorithms found in the literature. To improve scalability and usability we propose a wrapper library for local and remote execution environments that contains a set of optimizations such as the pWord2vec, pWord2vec MPI, Wang2vec and the original Word2vec algorithm. Utilizing these optimizations it is possible to achieve an average performance gain of 15x for multicores and 105x for multinodes compared to the original version. There is also a big reduction in the memory footprint compared to the most popular python versions.
URI: https://doi.org/10.1109/PDP50117.2020.00038
https://ieeexplore.ieee.org/document/9092420
http://hdl.handle.net/10174/29056
Type: article
Appears in Collections:CIDEHUS - Artigos em Livros de Actas/Proceedings

Files in This Item:

File Description SizeFormat
09092420.pdf125.39 kBAdobe PDFView/OpenRestrict Access. You can Request a copy!
FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpaceOrkut
Formato BibTex mendeley Endnote Logotipo do DeGóis 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Dspace Dspace
DSpace Software, version 1.6.2 Copyright © 2002-2008 MIT and Hewlett-Packard - Feedback
UEvora B-On Curriculum DeGois