OMO FUN

神社地域にデジタルを

❝世界のニュースを少し知る❞のコーディング

2023年06月04日 | 最終更新日：2023年07月14日 | コーディング |
Django4 python3.10 wsl bootstrap googletrans gtts feedparser pydub

googletransとgttsを使って実装しました。

どちらもインストールするだけで使えるのでお手軽です。ただし、googletransはバージョンによっては正しく動作しないので注意が必要です。

gttsは、日本語の発話が自然に感じられました。

ニュースソースは公開されているRSSからスクレイピングする方法を選択。

音声ファイル（mp.3）をDBには保存しない形で、サイトに表示させる方法や、ダウンロードできるようにする機能実装にかなり苦戦しました。

当初の意図の1/3程度ではありますが、それなりの動きになりました。

ただし動作はかなり遅いです。概要を記します。

事前準備（手順の概略）

＜Poetryで仮想環境を準備（例）＞事前にPoetryをインストールしておく

仮想環境をフォルダ直下にする

poetry config virtualenvs.in-project true

~$ mkdir o_poetry_dj
~$ cd o_poetry_dj/
~/o_poetry_dj$ poetry init

以下の設問にEnter、yes、noを入力

This command will guide you through creating your pyproject.toml config.
Package name [o_poetry_dj]:※Enterキー
Version [0.1.0]:※Enterキー
Description []:※Enterキー
Author [user , n to skip]:※Enterキー
License []:※Enterキー
Compatible Python versions [^3.10]:※Enterキー
Would you like to define your main dependencies interactively? (yes/no) [yes] no
Would you like to define your development dependencies interactively? (yes/no) [yes] no
Generated file
[tool.poetry]
name = "o_poetry_dj"
version = "0.1.0"
description = ""
authors = ["user "]
[tool.poetry.dependencies]
python = "^3.10"
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
Do you confirm generation? (yes/no) [yes] yes

poetry shell

＜venvの場合（例）＞※venvがインストールされていることを確認

python3 -m venv app_text_process
source app_o_ytdl/bin/activate

以後、仮想環境で作業

必要なパッケージをインストール

＜poetryの場合＞

(仮想名) poetry add feedparser googletrans=="4.0.0-rc1" gtts pydub...

＜pipの場合＞

(仮想名) pip install googletrans=="4.0.0-rc1" gtts pydub...

Djangoプロジェクトにアプリを作成

python manage.py startapp text_process

スーパーユーザーを作成

python manage.py createsuperuser

googletransは、4.0.0-rc1で動作が安定した。

settings.py(config/settings.py)


    INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    
    "text_process",    
    ...省略...

TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': [BASE_DIR / "templates"],
        ...省略...

STATIC_URL = 'static/'
STATICFILES_DIRS = [BASE_DIR / "static"]

MEDIA_URL = 'media/'
MEDIA_ROOT = BASE_DIR / 'media'

プロジェクトフォルダのsettings.pyに記述

urls.py(config/urls.py)


from django.contrib import admin
from django.urls import path, include
from django.conf import settings
from django.conf.urls.static import static


urlpatterns = [
    path('admin/', admin.site.urls),
    path("text_process/", include("text_process.urls")),
    ...省略...

if settings.DEBUG:
    urlpatterns += static(settings.STATIC_URL, document_root=settings.STATIC_ROOT)
    urlpatterns += static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)

プロジェクトフォルダのurls.pyに記述

フォルダ構成


├── text_process/
│   ├── admin.py
│   ├── apps.py
│   ├── forms.py
│   ├── __init__.py
│   ├── migrations/
│   ├── models.py
│   ├── __pycache__/
│   ├── tests.py
│   ├── urls.py
│   ├── utils_news.py
│   ├── utils.py
│   └── views.py

スクレイピング、和訳、音声合成の記述をするutils_news.pyを配置します。

utils_news.py

import feedparser
from googletrans import Translator

from gtts import gTTS
from io import BytesIO

from pydub import AudioSegment
from pydub.playback import play

import base64
import urllib.parse


def get_feed(rss_url):
    """rssフィードからニュースをゲット"""
    url = rss_url
    feed = feedparser.parse(url)
    
    t_list = []
    ja_text = ""
    for entry in feed.entries[:5]:
        j_text = trans_sentence(str(entry.title))
        ja_text += "。" + j_text
        t_list.append([j_text, entry.title, entry.link])
            
    return t_list, ja_text


def trans_sentence(text):
    translator = Translator()
    message = translator.translate(text, src="en", dest="ja")
    return message.text



def speak_trans(text):
    tts = gTTS(text=text, lang="ja", slow=False)
    
    # convert to file-like object
    fp = BytesIO()
    tts.write_to_fp(fp)
    fp.seek(0)    
    
    song = AudioSegment.from_file(fp, format="mp3") 
    
    # Assuming 'audio_segment' is your AudioSegment object
    audio_segment = song

    # Export AudioSegment to a byte array in WAV format
    audio_data = audio_segment.export(format='mp3').read()

    # Encode the byte array as base64
    base64_audio = base64.b64encode(audio_data).decode('utf-8')  
    
    
    song_64 = "data:audio/mpeg;base64," + urllib.parse.quote_plus(base64_audio)
    
    return song_64

RSSのスクレイピング。14～26行目
和訳。29～32行目
音声合成。36～58行目

base64での合成音声を戻す（returnさせます）
以下のサイト参照しました。

参照：https://blog.furas.pl/python-how-to-play-mp3-from-gtts-as-bytes-without-saving-on-disk-gb.html

音声合成に関しては以下のサイトを参考にさせていただきました。

参照：https://self-development.info/%E3%80%90python%E3%81%A7%E9%9F%B3%E5%A3%B0%E5%90%88%E6%88%90%EF%BC%88%E3%83%86%E3%82%AD%E3%82%B9%E3%83%88%E8%AA%AD%E3%81%BF%E4%B8%8A%E3%81%92%EF%BC%89%E3%80%91gtts%E3%81%AE%E3%82%A4%E3%83%B3%E3%82%B9/

forms.py


from django import forms

class NewsChoiceForm(forms.Form):
    DATA_CHOICE = (
        ("rss_bbc", "BBC"),
        ("rss_cnn", "CNN"),
        ("rss_nyt", "NewyorkTimes")
    )
    
    cho = forms.ChoiceField(label="選択中のメディア：ニュース5件", choices=DATA_CHOICE)

Choiceフィールドを持つシンプルなフォームを作ります。

views.py

from django.shortcuts import render

from .forms import NewsChoiceForm

from .utils_news import get_feed, speak_trans



def news_home(request):
    
    form = NewsChoiceForm()
    
    choice = request.GET.get("cho")
    if choice == "rss_bbc":
        choice_val = "http://feeds.bbci.co.uk/news/world/rss.xml"
    elif choice == "rss_cnn":
        choice_val = "http://rss.cnn.com/rss/money_news_international.rss"
    elif choice == "rss_nyt":
        choice_val = "https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml"
    else:
        choice_val = "http://feeds.bbci.co.uk/news/world/rss.xml"   
    
    
    
    feed_list, all = get_feed(choice_val)
    
    sound = speak_trans(all)
    
    context = {
        "news": feed_list, 
        "all": all, 
        "sound": sound,
        "form": form,
    }
    
    return render(request, "text_process/news.html", context)

templates/base.html

※雛形のイメージhtmlです用途に合わせて手直してください


{% load static %}
<!doctype html>
<html lanh=ja>
    <head>
        <!-- Require meta tags -->
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0", shrink-to-fit=none>
        
        <!-- Bootstrap CSS -->
        <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-EVSTQN3/azprG1Anm3QDgpJLIm9Nao0Yz1ztcQTwFspd3yD65VohhpuuCOmLASjC" crossorigin="anonymous">
        
        <!-- custom css & js -->
        <link rel="stylesheet" href="{% static 'style.css' %}">
        <script src="{% static 'main.js' %}" defer></script>
        
        
        <title>Spiner | {% block title %}{% endblock title %}</title>
    </head>
    <body>
        <div class="container mt-3">
            {% block content %}
            {% endblock content %}
        </div>
        <!-- Optional Javascript -->
        <!-- jQuery first, then Popper.js, then Bootstrap JS-->
        <script
        src="https://code.jquery.com/jquery-3.6.3.min.js"
        integrity="sha256-pvPw+upLPUjgMXY0G+8O0xUf+/Im1MZjXxxgOcBQBXU="
        crossorigin="anonymous"></script>
        <script src="https://cdn.jsdelivr.net/npm/@popperjs/core@2.9.2/dist/umd/popper.min.js" integrity="sha384-IQsoLXl5PILFhosVNubq5LC7Qb9DXgDA9i+tQ8Zj3iwWAwPtgFTxbJ8NT4GN1R8p" crossorigin="anonymous"></script>
        <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.0.2/dist/js/bootstrap.min.js" integrity="sha384-cVKIPhGWiC2Al4u+LWgxfKTRIcfu0JTxR+EQDz/bgldoEyl4H0zUF0QKbrJ0EcQF" crossorigin="anonymous"></script>
    </body>
</html>

templates/text_process/news.htmll

JavascriptでGetパラメータを取り出す処理を記述しています。

{% extends 'base.html' %}
{% load static %}

{% block title %}世界のニュースを少し知る-読み上げ-{% endblock title %}


{% block contents %}
    <div class="row">
        <div class="col">
          <h3 class="mb-3">世界のニュースを少し知る-読み上げ-</h3>

          <form method="GET" action="{% url 'text_process:news' %}" name="news_form">
            {{ form.as_p }}
            {% comment %} {% for field in form %}
                <div class="form-group">
                    {{ field.label_tag }}
                    {% render_field field class="form-control" style="border:none;" %}
                    {% if field.help_text %}
                        <small class="form-text text-muted">
                            {{ field.help_text }}
                        </small>
                    {% endif %}
                </div>
            {% endfor %} {% endcomment %}
            
            <button class="btn btn-dark rounded-capsule mb-3" onclick="clickBtn1()">表示</button>
        </form>

          <p>gTTs 和訳を読み上げ</p>
            <audio controls>
                <source src="{{sound}}" type="audio/mpeg">
                Your browser does not support audio
            </audio>
            <table class="table mt-3">
                <thead>
                  <tr>
                    <th scope="col">和訳</th>
                    <th scope="col">Title</th>
                    <th scope="col">URL</th>
                  </tr>
                </thead>
                <tbody>
                {% for value in news %}
                  <tr>
                        <td>{{value.0}}</td>
                        <td>{{value.1}}</td>
                        <td>{{value.2}}</td>
                  </tr>
                {% endfor %} 
                </tbody>
              </table>
        </div>
    </div>
{% endblock contents %}

{% block end_scripts %}
<script>
  // getパラメータからselectboxのテキストを抜き出す

  // URLのクエリストリングを取得
  const queryString = window.location.search;
  // URLSearchParamsオブジェクトを作成
  const urlParams = new URLSearchParams(queryString);
  // selectboxパラメータの値を取得
  const selectboxValue = urlParams.get('cho');

  console.log(selectboxValue)

  // selectboxのoption要素を取得
  const selectbox = document.getElementById('id_cho');

  // 選択肢と一致するものを選択
  for (let i = 0; i < selectbox.options.length; i++) {
  if (selectbox.options[i].value === selectboxValue) {
      selectbox.options[i].selected = true;
      const txt = selectbox.options[i].text;
      console.log(txt)
      
      break;               
      
      }
  }
</script>
{% endblock end_scripts %}