Brussels / 3 & 4 February 2024

schedule

FOSDEM 2024
/
Schedule
/
Events
/
Developer rooms
/
AI and Machine Learning
/
Building Open Source Language Models

Building Open Source Language Models

Track: AI and Machine Learning devroom
Room: UB2.252A (Lameere)
Day: Sunday
Start: 09:45
End: 10:00
Video only: ub2252a
Chat: Join the conversation!

LINAGORA, as a leader in the Open LLM France community, has made it a priority to pull the curtain off of the process of building Large Language Models (LLMs). While most LLMs in use today – even the “open” ones – reveal few to no details about their training, and especially the data on which they are trained, we have decided to share it all. In this talk, we discuss why using an open model trained on traceable data is important for business and research alike and examine some of the difficulties involved in pursuing an open strategy for LLMs. We bring to the table our experience with data collection and training of LLMs, including the Claire family of language models.

For links to the Claire models:

Main model: huggingface.co/OpenLLM-France/Claire-7B-0.1 (CC BY-NC-SA 4.0)
Main model in GGUF formats: huggingface.co/TheBloke/Claire-7B-0.1-GGUF
Variant model, license Apache: huggingface.co/OpenLLM-France/Claire-7B-Apache-0.1
Demo (simulated chat): huggingface.co/spaces/OpenLLM-France/Claire-Chat-0.1

Dataset & Code:

Full dataset: huggingface.co/datasets/OpenLLM-France/Claire-Dialogue-French-0.1
Paper: arxiv.org/abs/2311.16840 “The Claire French Dialogue Dataset”; https://huggingface.co/papers/2311.16840 (with links to related assets)
Databases survey: https://github.com/OpenLLM-France/Claire-datasets
Code for training: https://github.com/OpenLLM-France/Lit-Claire

Speakers

Attachments

Slides (slides)

Links

FOSDEM

This year

Practical information

Media and press

Social media

fosdem-2024

Brussels / 3 & 4 February 2024

This work is licensed under the Creative Commons Attribution 2.0 Belgium Licence.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/2.0/be/deed.en
or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.
All content such as talks and biographies is the sole responsibility of the speaker.