Merge pull request #14 from BoardWare-Genius/ivan

VLM implementation
This commit is contained in:
IvanWu
2024-08-20 18:09:41 +08:00
committed by GitHub
3 changed files with 377 additions and 290 deletions

187
README.md
View File

@ -1,92 +1,95 @@
# jarvis-models # jarvis-models
## Conda Environment and Python Library Requirement ## Conda Environment and Python Library Requirement
```bash ```bash
conda create -n jarvis-models python==3.10.11 conda create -n jarvis-models python==3.10.11
pip install -r sample/requirement_out_of_pytorch.txt pip install -r sample/requirement_out_of_pytorch.txt
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118 pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
``` ```
## More Dependencies ## More Dependencies
| System | package | web | install command | | System | package | web | install command |
| --- |-----------------------| --- | --- | | --- |-----------------------| --- | --- |
| python | filetype | https://pypi.org/project/filetype/ | pip install filetype | | python | filetype | https://pypi.org/project/filetype/ | pip install filetype |
| python | fastAPI | https://fastapi.tiangolo.com/ | pip install fastapi | | python | fastAPI | https://fastapi.tiangolo.com/ | pip install fastapi |
| python | python-multipart | https://pypi.org/project/python-multipart/ | pip install python-multipart | | python | python-multipart | https://pypi.org/project/python-multipart/ | pip install python-multipart |
| python | uvicorn | https://www.uvicorn.org/ | pip install "uvicorn[standard]" | | python | uvicorn | https://www.uvicorn.org/ | pip install "uvicorn[standard]" |
| python | SpeechRecognition | https://pypi.org/project/SpeechRecognition/ | pip install SpeechRecognition | | python | SpeechRecognition | https://pypi.org/project/SpeechRecognition/ | pip install SpeechRecognition |
| python | gtts | https://pypi.org/project/gTTS/ | pip install gTTS | | python | gtts | https://pypi.org/project/gTTS/ | pip install gTTS |
| python | PyYAML | https://pypi.org/project/PyYAML/ | pip install PyYAML | | python | PyYAML | https://pypi.org/project/PyYAML/ | pip install PyYAML |
| python | injector | https://github.com/python-injector/injector | pip install injector | | python | injector | https://github.com/python-injector/injector | pip install injector |
| python | langchain | https://github.com/langchain-ai/langchain | pip install langchain | | python | langchain | https://github.com/langchain-ai/langchain | pip install langchain |
| python | chromadb | https://docs.trychroma.com/getting-started | pip install chromadb | | python | chromadb | https://docs.trychroma.com/getting-started | pip install chromadb |
| python | lagent | https://github.com/InternLM/lagent/blob/main/README.md | pip install lagent | | python | lagent | https://github.com/InternLM/lagent/blob/main/README.md | pip install lagent |
| python | sentence_transformers | https://github.com/InternLM/lagent/blob/main/README.md | pip install sentence_transformers | | python | sentence_transformers | https://github.com/InternLM/lagent/blob/main/README.md | pip install sentence_transformers |
## Start ## Start
Start the jarvis-models service via Start the jarvis-models service via
```bash ```bash
uvicorn main:app --reload uvicorn main:app --reload
``` ```
or or
```bash ```bash
python main.py python main.py
``` ```
## Configuration ## Configuration
Create ".env.yaml" at the root of jarvis-models, and copy the following yaml configuration Create ".env.yaml" at the root of jarvis-models, and copy the following yaml configuration
```yaml ```yaml
env: env:
version: 0.0.1 version: 0.0.1
host: 0.0.0.0 host: 0.0.0.0
port: 8000 port: 8000
log: log:
level: debug level: debug
time_format: "%Y-%m-%d %H:%M:%S" time_format: "%Y-%m-%d %H:%M:%S"
filename: "D:/Workspace/Logging/jarvis/jarvis-models.log" filename: "D:/Workspace/Logging/jarvis/jarvis-models.log"
melotts: melotts:
mode: local # or docker mode: local # or docker
url: http://10.6.44.16:18080/convert/tts url: http://10.6.44.16:18080/convert/tts
speed: 0.9 speed: 0.9
device: 'cuda' device: 'cuda'
language: 'ZH' language: 'ZH'
speaker: 'ZH' speaker: 'ZH'
tesou: tesou:
url: http://120.196.116.194:48891/chat/ url: http://120.196.116.194:48891/chat/
TokenIDConverter: TokenIDConverter:
token_path: src/asr/resources/models/token_list.pkl token_path: src/asr/resources/models/token_list.pkl
unk_symbol: <unk> unk_symbol: <unk>
CharTokenizer: CharTokenizer:
symbol_value: symbol_value:
space_symbol: <space> space_symbol: <space>
remove_non_linguistic_symbols: false remove_non_linguistic_symbols: false
WavFrontend: WavFrontend:
cmvn_file: src/asr/resources/models/am.mvn cmvn_file: src/asr/resources/models/am.mvn
frontend_conf: frontend_conf:
fs: 16000 fs: 16000
window: hamming window: hamming
n_mels: 80 n_mels: 80
frame_length: 25 frame_length: 25
frame_shift: 10 frame_shift: 10
lfr_m: 7 lfr_m: 7
lfr_n: 6 lfr_n: 6
filter_length_max: -.inf filter_length_max: -.inf
dither: 0.0 dither: 0.0
Model: Model:
model_path: src/asr/resources/models/model.onnx model_path: src/asr/resources/models/model.onnx
use_cuda: false use_cuda: false
CUDAExecutionProvider: CUDAExecutionProvider:
device_id: 0 device_id: 0
arena_extend_strategy: kNextPowerOfTwo arena_extend_strategy: kNextPowerOfTwo
cudnn_conv_algo_search: EXHAUSTIVE cudnn_conv_algo_search: EXHAUSTIVE
do_copy_in_default_stream: true do_copy_in_default_stream: true
batch_size: 3 batch_size: 3
blackbox: blackbox:
lazyloading: true lazyloading: true
```
vlms:
url: http://10.6.80.87:23333
```

View File

@ -1,67 +1,144 @@
from fastapi import Request, Response, status from fastapi import Request, Response, status
from fastapi.responses import JSONResponse from fastapi.responses import JSONResponse
from .blackbox import Blackbox from injector import singleton,inject
from typing import Optional from typing import Optional, List
import requests from .blackbox import Blackbox
import base64 from ..log.logging_time import logging_time
# from .chroma_query import ChromaQuery
from ..configuration import VLMConf
def is_base64(value) -> bool:
try: import requests
base64.b64decode(base64.b64decode(value)) == value.encode() import base64
return True
except Exception: import io
return False from PIL import Image
from lmdeploy.serve.openai.api_client import APIClient
class VLMS(Blackbox): def is_base64(value) -> bool:
try:
def __call__(self, *args, **kwargs): base64.b64decode(base64.b64decode(value)) == value.encode()
return self.processing(*args, **kwargs) return True
except Exception:
def valid(self, *args, **kwargs) -> bool: return False
data = args[0]
return isinstance(data, list) @singleton
class VLMS(Blackbox):
def processing(self, prompt, images, model_name: Optional[str] = None) -> str:
@inject
if model_name == "Qwen-VL-Chat": def __init__(self, vlm_config: VLMConf):
model_name = "infer-qwen-vl" # Chroma database initially set up for RAG for vision model.
elif model_name == "llava-llama-3-8b-v1_1-transformers": # It could be expended to history store.
model_name = "infer-lav-lam-v1-1" # self.chroma_query = chroma_query
else: self.url = vlm_config.url
model_name = "infer-qwen-vl"
def __call__(self, *args, **kwargs):
url = 'http://120.196.116.194:48894/' + model_name + '/' return self.processing(*args, **kwargs)
if is_base64(images): def valid(self, *args, **kwargs) -> bool:
images_data = images data = args[0]
else: return isinstance(data, list)
with open(images, "rb") as img_file:
images_data = str(base64.b64encode(img_file.read()), 'utf-8') def processing(self, prompt:str, images:str | bytes, model_name: Optional[str] = None, user_context: List[dict] = None) -> str:
"""
data_input = {'model': model_name, 'prompt': prompt, 'img_data': images_data} Args:
prompt: a string query to the model.
data = requests.post(url, json=data_input) images: a base64 string of image data;
user_context: a list of history conversation, should be a list of openai format.
return data.text
async def fast_api_handler(self, request: Request) -> Response: Return:
try: response: a string
data = await request.json() history: a list
except: """
return JSONResponse(content={"error": "json parse error"}, status_code=status.HTTP_400_BAD_REQUEST) if model_name == "Qwen-VL-Chat":
model_name = "infer-qwen-vl"
model_name = data.get("model_name") elif model_name == "llava-llama-3-8b-v1_1-transformers":
prompt = data.get("prompt") model_name = "infer-lav-lam-v1-1"
img_data = data.get("img_data") else:
model_name = "infer-qwen-vl"
if prompt is None:
return JSONResponse(content={'error': "Question is required"}, status_code=status.HTTP_400_BAD_REQUEST)
# Transform the images into base64 format where openai format need.
if model_name is None or model_name.isspace(): if is_base64(images): # image as base64 str
model_name = "Qwen-VL-Chat" images_data = images
elif isinstance(images,bytes): # image as bytes
jsonresp = str(JSONResponse(content={"response": self.processing(prompt, img_data, model_name)}).body, "utf-8") images_data = str(base64.b64encode(images),'utf-8')
return JSONResponse(content={"response": jsonresp}, status_code=status.HTTP_200_OK) else: # image as pathLike str
# with open(images, "rb") as img_file:
# images_data = str(base64.b64encode(img_file.read()), 'utf-8')
res = requests.get(images)
images_data = str(base64.b64encode(res.content),'utf-8')
## AutoLoad Model
# url = 'http://10.6.80.87:8000/' + model_name + '/'
# data_input = {'model': model_name, 'prompt': prompt, 'img_data': images_data}
# data = requests.post(url, json=data_input)
# print(data.text)
# return data.text
# 'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg'
## Lmdeploy
if not user_context:
user_context = []
# user_context = [{'role':'user','content':'你好'}, {'role': 'assistant', 'content': '你好!很高兴为你提供帮助。'}]
api_client = APIClient(self.url)
model_name = api_client.available_models[0]
messages = user_context + [{
'role': 'user',
'content': [{
'type': 'text',
'text': prompt,
}, {
'type': 'image_url',
'image_url': {
'url': f"data:image/jpeg;base64,{images_data}",
# './val_data/image_5.jpg',
},
}]
}
]
responses = ''
total_token_usage = 0 # which can be used to count the cost of a query
for i,item in enumerate(api_client.chat_completions_v1(model=model_name,
messages=messages#,stream = True
)):
# print(item["choices"][0]["message"]['content'])
responses += item["choices"][0]["message"]['content']
total_token_usage += item['usage']['total_tokens'] # 'usage': {'prompt_tokens': *, 'total_tokens': *, 'completion_tokens': *}
user_context = messages + [{'role': 'assistant', 'content': responses}]
return responses, user_context
async def fast_api_handler(self, request: Request) -> Response:
json_request = True
try:
content_type = request.headers['content-type']
if content_type == 'application/json':
data = await request.json()
else:
data = await request.form()
json_request = False
except Exception as e:
return JSONResponse(content={"error": "json parse error"}, status_code=status.HTTP_400_BAD_REQUEST)
model_name = data.get("model_name")
prompt = data.get("prompt")
if json_request:
img_data = data.get("img_data")
else:
img_data = await data.get("img_data").read()
if prompt is None:
return JSONResponse(content={'error': "Question is required"}, status_code=status.HTTP_400_BAD_REQUEST)
if model_name is None or model_name.isspace():
model_name = "Qwen-VL-Chat"
response, history = self.processing(prompt, img_data, model_name)
# jsonresp = str(JSONResponse(content={"response": self.processing(prompt, img_data, model_name)}).body, "utf-8")
return JSONResponse(content={"response": response, "history": history}, status_code=status.HTTP_200_OK)

View File

@ -1,131 +1,138 @@
from dataclasses import dataclass from dataclasses import dataclass
from injector import inject,singleton from injector import inject,singleton
import yaml import yaml
import sys import sys
import logging import logging
@singleton @singleton
class Configuration(): class Configuration():
@inject @inject
def __init__(self) -> None: def __init__(self) -> None:
config_file_path = "" config_file_path = ""
try: try:
config_file_path = sys.argv[1] config_file_path = sys.argv[1]
except: except:
config_file_path = ".env.yaml" config_file_path = ".env.yaml"
with open(config_file_path) as f: with open(config_file_path) as f:
cfg = yaml.load(f, Loader=yaml.FullLoader) cfg = yaml.load(f, Loader=yaml.FullLoader)
self.cfg = cfg self.cfg = cfg
def getDict(self): def getDict(self):
return self.cfg return self.cfg
""" """
# yaml 檔中的路徑 get("aaa.bbb.ccc") # yaml 檔中的路徑 get("aaa.bbb.ccc")
aaa: aaa:
bbb: bbb:
ccc: "hello world" ccc: "hello world"
""" """
def get(self, path: str | list[str], cfg: dict = None, default=None): def get(self, path: str | list[str], cfg: dict = None, default=None):
if isinstance(path, str): if isinstance(path, str):
if cfg is None: if cfg is None:
cfg = self.cfg cfg = self.cfg
return self.get(path.split("."), cfg) return self.get(path.split("."), cfg)
length = len(path) length = len(path)
if length == 0 or not isinstance(cfg, dict): if length == 0 or not isinstance(cfg, dict):
return default return default
if length == 1: if length == 1:
return cfg.get(path[0]) return cfg.get(path[0])
return self.get(path[1:], cfg.get(path[0])) return self.get(path[1:], cfg.get(path[0]))
class TesouConf(): class TesouConf():
url: str url: str
@inject @inject
def __init__(self, config: Configuration) -> None: def __init__(self, config: Configuration) -> None:
self.url = config.get("tesou.url") self.url = config.get("tesou.url")
class MeloConf(): class MeloConf():
mode: str mode: str
url: str url: str
speed: int speed: int
device: str device: str
language: str language: str
speaker: str speaker: str
@inject @inject
def __init__(self, config: Configuration) -> None: def __init__(self, config: Configuration) -> None:
self.mode = config.get("melotts.mode") self.mode = config.get("melotts.mode")
self.url = config.get("melotts.url") self.url = config.get("melotts.url")
self.speed = config.get("melotts.speed") self.speed = config.get("melotts.speed")
self.device = config.get("melotts.device") self.device = config.get("melotts.device")
self.language = config.get("melotts.language") self.language = config.get("melotts.language")
self.speaker = config.get("melotts.speaker") self.speaker = config.get("melotts.speaker")
class CosyVoiceConf(): class CosyVoiceConf():
mode: str mode: str
url: str url: str
speed: int speed: int
device: str device: str
language: str language: str
speaker: str speaker: str
@inject @inject
def __init__(self, config: Configuration) -> None: def __init__(self, config: Configuration) -> None:
self.mode = config.get("cosyvoicetts.mode") self.mode = config.get("cosyvoicetts.mode")
self.url = config.get("cosyvoicetts.url") self.url = config.get("cosyvoicetts.url")
self.speed = config.get("cosyvoicetts.speed") self.speed = config.get("cosyvoicetts.speed")
self.device = config.get("cosyvoicetts.device") self.device = config.get("cosyvoicetts.device")
self.language = config.get("cosyvoicetts.language") self.language = config.get("cosyvoicetts.language")
self.speaker = config.get("cosyvoicetts.speaker") self.speaker = config.get("cosyvoicetts.speaker")
# 'CRITICAL': CRITICAL, # 'CRITICAL': CRITICAL,
# 'FATAL': FATAL, # 'FATAL': FATAL,
# 'ERROR': ERROR, # 'ERROR': ERROR,
# 'WARN': WARNING, # 'WARN': WARNING,
# 'WARNING': WARNING, # 'WARNING': WARNING,
# 'INFO': INFO, # 'INFO': INFO,
# 'DEBUG': DEBUG, # 'DEBUG': DEBUG,
# 'NOTSET': NOTSET, # 'NOTSET': NOTSET,
DEFAULT_LEVEL="WARNING" DEFAULT_LEVEL="WARNING"
DEFAULT_TIME_FORMAT="%Y-%m-%d %H:%M:%S" DEFAULT_TIME_FORMAT="%Y-%m-%d %H:%M:%S"
@singleton @singleton
class LogConf(): class LogConf():
level: int level: int
time_format = "%Y-%m-%d %H:%M:%S" time_format = "%Y-%m-%d %H:%M:%S"
filename: str | None filename: str | None
@inject @inject
def __init__(self, config: Configuration) -> None: def __init__(self, config: Configuration) -> None:
self.level = config.get("log.level") self.level = config.get("log.level")
c = config.get("log.level", default=DEFAULT_LEVEL).upper() c = config.get("log.level", default=DEFAULT_LEVEL).upper()
level=logging._nameToLevel.get(c) level=logging._nameToLevel.get(c)
if level is None: if level is None:
self.level = logging.WARNING self.level = logging.WARNING
else: else:
self.level = level self.level = level
self.filename = config.get("log.filename") self.filename = config.get("log.filename")
self.time_format = config.get("log.time_format", default=DEFAULT_TIME_FORMAT) self.time_format = config.get("log.time_format", default=DEFAULT_TIME_FORMAT)
@singleton @singleton
class EnvConf(): class EnvConf():
version: str version: str
host: str host: str
port: str port: str
@inject @inject
def __init__(self, config: Configuration) -> None: def __init__(self, config: Configuration) -> None:
self.version = "0.0.1" self.version = "0.0.1"
self.host = config.get("env.host", default="0.0.0.0") self.host = config.get("env.host", default="0.0.0.0")
self.port = config.get("env.port", default="8080") self.port = config.get("env.port", default="8080")
@singleton @singleton
@dataclass @dataclass
class BlackboxConf(): class BlackboxConf():
lazyloading: bool lazyloading: bool
@inject @inject
def __init__(self, config: Configuration) -> None: def __init__(self, config: Configuration) -> None:
self.lazyloading = bool(config.get("blackbox.lazyloading", default=False)) self.lazyloading = bool(config.get("blackbox.lazyloading", default=False))
@singleton
class VLMConf():
@inject
def __init__(self, config: Configuration) -> None:
self.url = config.get("vlms.url")